This article describes how line and paragraph separators are defined and how you can separate a string by paragraph.
There are a number of ways in which a line or paragraph break may be represented. Historically \n
, \r
, and \r\n
have been used. Unicode defines an unambiguous paragraph separator, U+2029
(for which Cocoa provides the constant NSParagraphSeparatorCharacter
), and an unambiguous line separator, U+2028
(for which Cocoa provides the constant NSLineSeparatorCharacter
).
In the Cocoa text system, the NSParagraphSeparatorCharacter
is treated consistently as a paragraph break, and NSLineSeparatorCharacter
is treated consistently as a line break that is not a paragraph break—that is, a line break within a paragraph. However, in other contexts, there are few guarantees as to how these characters will be treated. POSIX-level software, for example, often recognizes only \n
as a break. Some older Macintosh software recognizes only \r
, and some Windows software recognizes only \r\n
. Often there is no distinction between line and paragraph breaks.
Which line or paragraph break character you should use depends on how your data may be used and on what platforms. The Cocoa text system recognizes \n
, \r
, or \r\n
all as paragraph breaks—equivalent to NSParagraphSeparatorCharacter
. When it inserts paragraph breaks, for example with insertNewline:
, it uses \n
. Ordinarily NSLineSeparatorCharacter
is used only for breaks that are specifically line breaks and not paragraph breaks, for example in insertLineBreak:
, or for representing HTML <br>
elements.
If your breaks are specifically intended as line breaks and not paragraph breaks, then you should typically use NSLineSeparatorCharacter
. Otherwise, you may use \n
, \r
, or \r\n
depending on what other software is likely to process your text. The default choice for Cocoa is usually \n
.
A common approach to separating a string “by paragraph” is simply to use:
NSArray *arr = [myString componentsSeparatedByString:@"\n"]; |
This, however, ignores the fact that there are a number of other ways in which a paragraph or line break may be represented in a string—\r
, \r\n
, or Unicode separators. Instead you can use methods—such as lineRangeForRange:
or getParagraphStart:end:contentsEnd:forRange:
—that take into account the variety of possible line terminations, as illustrated in the following example.
NSString *string = /* assume this exists */; |
unsigned length = [string length]; |
unsigned paraStart = 0, paraEnd = 0, contentsEnd = 0; |
NSMutableArray *array = [NSMutableArray array]; |
NSRange currentRange; |
while (paraEnd < length) { |
[string getParagraphStart:¶Start end:¶End |
contentsEnd:&contentsEnd forRange:NSMakeRange(paraEnd, 0)]; |
currentRange = NSMakeRange(paraStart, contentsEnd - paraStart); |
[array addObject:[string substringWithRange:currentRange]]; |
} |
Last updated: 2009-10-15