PATH  Documentation > Mac OS X > Foundation Reference: Java



Table of Contents

NSCharacterSet


Inherits from:
NSObject
Package:
com.apple.yellow.foundation


Class Description


An NSCharacterSet object represents a set of Unicode characters. String and NSScanner objects use NSCharacterSets to group characters together for searching operations, so that they can find any of a particular set of characters during a search. The cluster's two public classes, NSCharacterSet and NSMutableCharacterSet, declare the programmatic interface for static and dynamic character sets, respectively.

The objects you create using these classes are referred to as character set objects (and when no confusion will result, merely as character sets). Because of the nature of class clusters, character set objects aren't actual instances of the NSCharacterSet or NSMutableCharacterSet classes but of one of their private subclasses. Although a character set object's class is private, its interface is public, as declared by these abstract superclasses, NSCharacterSet and NSMutableCharacterSet.

The NSCharacterSet class declares the programmatic interface for an object that manages a set of Unicode characters (see the NSStringReference class cluster specification for information on Unicode). NSCharacterSet's principal primitive method, characterIsMember, provides the basis for all other instance methods in its interface. A subclass of NSCharacterSet needs only to implement this method for proper behavior. For optimal performance, a subclass should also override bitmapRepresentation which otherwise works by invoking characterIsMember for every possible Unicode value.


Building a Character Set


NSCharacterSet defines class methods that return commonly used character sets, such as letters (uppercase or lowercase), decimal digits, whitespace, and so on. These "standard" character sets are always immutable, even if created by sending a message to NSMutableCharacterSet. See "Standard Character Sets and Unicode Definitions" below for more information on standard character sets.

You can use a standard character set as a starting point for building a custom set by making a mutable copy of it and changing that. (You can also start from scratch by creating a mutable character set and adding characters to it.)

For performance reasons (explained in "Using a Character Set" ), always finish by converting the working mutable character set into an immutable set. If you need to keep changing the character set after you've created it, of course, you should just use the mutable set.

If your application frequently uses a custom character set, you'll want to save its definition in a resource file and load that instead of explicitly adding individual characters each time you need to create the set. You can save a character set by getting its bitmap representation (an NSData object) and saving that object to a file.

Character set filenames by convention use the extension .bitmap. If you intend for others to use your character set files, you should follow this convention. To read a character set file with a .bitmap extension, simply use the characterSetWithContentsOfFile method.


Using a Character Set


A character set object doesn't perform any tasks; it simply holds a set of character values to limit operations on strings. The String and NSScanner classes define methods that take NSCharacterSets as arguments to find any of several characters.

Because character sets often participate in performance-critical code, you should be aware of the aspects of their use that can affect the performance of your application. Mutable character sets are generally much more expensive than immutable character sets. They consume more memory and are costly to invert (an operation often performed in scanning a string). Because of this, you should follow these guidelines:


Standard Character Sets and Unicode Definitions


The standard character sets, such as that returned by letterCharacterSet, are formally defined in terms of the normative and informative categories established by the Unicode standard, such as Uppercase Letter, Combining Mark, and so on. The formal definition of a standard character set is in most cases given as one or more of the categories defined in the standard. For example, the set returned by lowercaseLetterCharacterSet include all characters in normative category Lowercase Letters, while the set returned by letterCharacterSet includes the characters in all of the Letter categories.

Note that the definitions of the categories themselves may change with new versions of the Unicode standard. You can download the files that define category membership from http://www.unicode.org/.




Method Types


Constructors
NSCharacterSet
Creating a standard character set
alphanumericCharacterSet
controlCharacterSet
decimalDigitCharacterSet
decomposableCharacterSet
illegalCharacterSet
letterCharacterSet
lowercaseLetterCharacterSet
nonBaseCharacterSet
punctuationCharacterSet
uppercaseLetterCharacterSet
whitespaceAndNewlineCharacterSet
whitespaceCharacterSet
Opening a character set file
characterSetWithContentsOfFile
Testing set membership
characterIsMember
Getting a binary representation
bitmapRepresentation
Deriving new character sets
characterSetByIntersectingCharacterSet
characterSetByInvertingCharacterSet
characterSetBySubtractingCharacterSet
characterSetByUnioningCharacterSet


Constructors



NSCharacterSet

public NSCharacterSet()

Description forthcoming.

public NSCharacterSet(NSData aData)

Description forthcoming.

public NSCharacterSet(NSRange aRange)

Description forthcoming.

public NSCharacterSet(String aString)

Description forthcoming.


Static Methods



alphanumericCharacterSet

public static NSCharacterSet alphanumericCharacterSet()

Returns a character set containing the characters in the categories Letters, Marks, and Numbers. Informally, this is the set of all characters used as basic units of alphabets, syllabaries, ideographs, and digits.

See Also: letterCharacterSet, decimalDigitCharacterSet



characterSetWithContentsOfFile

public static NSCharacterSet characterSetWithContentsOfFile(String aString)

Returns a character set read from the bitmap representation stored in the file at path, which must end with the extension .bitmap.

This method doesn't perform filename-based uniquing of the character sets it creates. To prevent duplication of character sets in memory, cache them and make them available through an API that checks whether the requested set has already been loaded.



controlCharacterSet

public static NSCharacterSet controlCharacterSet()

Returns a character set containing the characters in the categories of Control or Format Characters. These are specifically the Unicode values U+0000 to U+001F and U+007F to U+009F.

See Also: illegalCharacterSet



decimalDigitCharacterSet

public static NSCharacterSet decimalDigitCharacterSet()

Returns a character set containing the characters in the category of Decimal Numbers. Informally, this is the set of all characters used to represent the decimal values 0 through 9. These include, for example, the decimal digits of the Indic scripts and Arabic.

See Also: alphanumericCharacterSet



decomposableCharacterSet

public static NSCharacterSet decomposableCharacterSet()

Returns a character set containing all individual Unicode characters that can also be represented as composed character sequences (such as for letters with accents), by the definition of "standard decomposition" in version 1.1 of the Unicode character encoding standard. These include compatibility characters as well as precomposed characters.
Note: This character set doesn't currently include the Hangul characters defined in version 2.0 of the Unicode standard.

See Also: nonBaseCharacterSet



illegalCharacterSet

public static NSCharacterSet illegalCharacterSet()

Returns a character set containing values in the category of Non-Characters, or that have not yet been defined in version 2.0 of the Unicode standard.

See Also: controlCharacterSet



letterCharacterSet

public static NSCharacterSet letterCharacterSet()

Returns a character set containing the characters in the categories Letters and Marks. Informally, this is the set of all characters used as letters of alphabets and ideographs.

See Also: alphanumericCharacterSet, lowercaseLetterCharacterSet, uppercaseLetterCharacterSet



lowercaseLetterCharacterSet

public static NSCharacterSet lowercaseLetterCharacterSet()

Returns a character set containing the characters in the category of Lowercase Letters. Informally, this is the set of all characters used as lowercase letters in alphabets which make case distinctions.

See Also: uppercaseLetterCharacterSet, letterCharacterSet



nonBaseCharacterSet

public static NSCharacterSet nonBaseCharacterSet()

Returns a character set containing the characters in the category of Marks. This set is also defined as all legal Unicode characters with a non-spacing priority greater than zero. Informally, this is the set of all characters used as modifiers of base characters.

See Also: decomposableCharacterSet



punctuationCharacterSet

public static NSCharacterSet punctuationCharacterSet()

Returns a character set containing the characters in the category of Punctuation. Informally, this is the set of all non-whitespace characters used to separate linguistic units in scripts, such as periods, dashes, parentheses, and so on.

uppercaseLetterCharacterSet

public static NSCharacterSet uppercaseLetterCharacterSet()

Returns a character set containing the characters in the category of Uppercase Letters. Informally, this is the set of all characters used as uppercase letters in alphabets which make case distinctions.

See Also: lowercaseLetterCharacterSet, letterCharacterSet



whitespaceAndNewlineCharacterSet

public static NSCharacterSet whitespaceAndNewlineCharacterSet()

Returns a character set containing only the whitespace characters space (U+0020) and tab (U+0009) and the newline character (U+000A).

See Also: whitespaceCharacterSet



whitespaceCharacterSet

public static NSCharacterSet whitespaceCharacterSet()

Returns a character set containing only the in-line whitespace characters space (U+0020) and tab (U+0009). This set doesn't contain the newline or carriage return characters.

See Also: whitespaceAndNewlineCharacterSet




Instance Methods



bitmapRepresentation

public NSData bitmapRepresentation()

Returns an NSData object encoding the receiving character set in binary format. This format is suitable for saving to a file or otherwise transmitting or archiving.

A raw bitmap representation of a character set is a byte array of 216 bits (that is, 8192 bytes). The value of the bit at position n represents the presence in the character set of the character with decimal Unicode value n.



characterIsMember

public boolean characterIsMember(char aChar)

Returns true if aCharacter is in the receiving character set, false if it isn't.

characterSetByIntersectingCharacterSet

public NSCharacterSetcharacterSetByIntersectingCharacterSet(NSCharacterSet aCharacterSet)

Description forthcoming.

characterSetByInvertingCharacterSet

public NSCharacterSet characterSetByInvertingCharacterSet()

Description forthcoming.

characterSetBySubtractingCharacterSet

public NSCharacterSet characterSetBySubtractingCharacterSet(NSCharacterSet aCharacterSet)

Description forthcoming.

characterSetByUnioningCharacterSet

public NSCharacterSet characterSetByUnioningCharacterSet(NSCharacterSet aCharacterSet)

Description forthcoming.


Table of Contents