The Basic Conversion Routines

Among the String Services functions that convert the encodings of characters in CFString objects are the two low-level conversion functions, CFStringGetBytes and CFStringCreateWithBytes . As their names suggest, these functions operate on byte buffers of a known size. In addition to performing encoding conversions, they also handle any special characters in a string (such as a BOM) that makes the string suitable for external representation.

However, the CFStringGetBytes function is particularly useful for encoding conversions because it allows the specification of a loss byte . If you specify a character for the loss byte, the function substitutes that character when it cannot convert the Unicode value to the proper character. If you specify zero for the loss byte, this "lossy conversion" is not allowed and the function returns (indirectly) an partial set of characters when it encounters the first character it cannot convert. All other content-accessing functions of String Services disallow lossy conversion.

Listing 13 illustrates how CFStringGetBytes might be used to convert a string from the system encoding to Windows Latin 1. Note one other feature of the function: it allows you to convert a string into a fixed-size buffer one segment at a time.

Listing 13 Converting to a different encoding with CFStringGetBytes
CFStringRef str; CFRange rangeToProcess; str = CFStringCreateWithCString(NULL, "Hello World", CFStringGetSystemEncoding()); rangeToProcess = CFRangeMake(0, CFStringGetLength(str)); while (rangeToProcess.length > 0) { UInt8 localBuffer[100]; CFIndex usedBufferLength; CFIndex numChars = CFStringGetBytes(str, rangeToProcess, kCFStringEncodingWindowsLatin1, `?', FALSE, (UInt8 *)localBuffer, 100, &usedBufferLength); if (numChars == 0) break; // Failed to convert anything... processCharacters(localBuffer, usedBufferLength); rangeToProcess.location += numChars; rangeToProcess.length -= numChars; }

If the size of the string to convert is relatively small, you can take a different approach with the CFStringGetBytes function. With the buffer parameter set to NULL you can call the function to find out two things. If the function result is greater than zero conversion is possible. And, if conversion is possible, the last parameter ( usedBufLen ) will contain the number of bytes required for the conversion. With this information you can allocate a buffer of the needed size and convert the string at one shot into the desired encoding. However, if the string is large this technique has its drawbacks; asking for the length could be expensive and the allocation could require a lot of memory.


© 1999 Apple Computer, Inc. – (Last Updated 07 September 99)