Carbon


ConvertFromTextToUnicode

Header: UnicodeConverter.h Carbon status: Supported

Converts a string from any encoding to Unicode.

OSStatus ConvertFromTextToUnicode (
    TextToUnicodeInfo iTextToUnicodeInfo, 
    ByteCount iSourceLen, 
    ConstLogicalAddress iSourceStr, 
    OptionBits iControlFlags, 
    ItemCount iOffsetCount, 
    ByteOffset iOffsetArray[], 
    ItemCount *oOffsetCount, 
    ByteOffset oOffsetArray[], 
    ByteCount iOutputBufLen, 
    ByteCount *oSourceRead, 
    ByteCount *oUnicodeLen, 
    UniCharArrayPtr oUnicodeStr
);
Parameter descriptions
iTextToUnicodeInfo

A Unicode converter object of type TextToUnicodeInfo containing mapping and state information used for the conversion. The contents of this Unicode converter object are modified by the function. Your application obtains a Unicode converter object using the function CreateTextToUnicodeInfo.

iSourceLen

The length in bytes of the source string to be converted.

iSourceStr

The address of the source string to be converted.

iControlFlags

Conversion control flags. You can use “Conversion Control Masks” to set the iControlFlags parameter.

iOffsetCount

The number of offsets in the iOffsetArray parameter. Your application supplies this value. The number of entries in iOffsetArray must be fewer than the number of bytes specified in iSourceLen. If you don’t want offsets returned to you, specify 0 (zero) for this parameter.

iOffsetArray

An array of type ByteOffset. On input, you specify the array that contains an ordered list of significant byte offsets pertaining to the source string. These offsets may identify font or style changes, for example, in the source string. All array entries must be less than the length in bytes specified by the iSourceLen parameter. If you don’t want offsets returned to your application, specify NULL for this parameter and 0 (zero) for iOffsetCount.

oOffsetCount

On return, a pointer to the number of offsets that were mapped in the output stream.

oOffsetArray

An array of type ByteOffset. On return, this array contains the corresponding new offsets for the Unicode string produced by the converter.

iOutputBufLen

The length in bytes of the output buffer pointed to by the oUnicodeStr parameter. Your application supplies this buffer to hold the returned converted string. The oUnicodeLen parameter may return a byte count that is less than this value if the converted byte string is smaller than the buffer size you allocated. The relationship between the size of the source string and the Unicode string is complex and depends on the source encoding and the contents of the string.

oSourceRead

On return, a pointer to the number of bytes of the source string that were converted. If the function returns a kTECUnmappableElementErr result code, this parameter returns the number of bytes that were converted before the error occurred.

oUnicodeLen

On return, a pointer to the length in bytes of the converted stream.

oUnicodeStr

A pointer to an array used to hold a Unicode string. On input, this value points to the beginning of the array for the converted string. On return, this buffer holds the converted Unicode string. (For guidelines on estimating the size of the buffer needed, see the discussion.)

function result

A result code. The function returns a noErr result code if it has completely converted the input string to Unicode without using fallback characters.

DISCUSSION

You specify the source string’s encoding in the Unicode mapping structure that you pass to the function CreateTextToUnicodeInfo to obtain a Unicode converter object for the conversion. You pass the Unicode converter object returned by CreateTextToUnicodeInfo to ConvertFromTextToUnicode as the iTextToUnicodeInfo parameter.

In addition to converting a text string in any encoding to Unicode, the ConvertFromTextToUnicode function can map offsets for style or font information from the source text string to the returned converted string. The converter reads the application-supplied offsets, which apply to the source string, and returns the corresponding new offsets in the converted string. If you do not want the offsets at which font or style information occurs mapped to the resulting string, you should pass NULL for iOffsetArray and 0 (zero) for iOffsetCount.

Your application must allocate a buffer to hold the resulting converted string and pass a pointer to the buffer in the oUnicodeStr parameter. To determine the size of the output buffer to allocate, you should consider the size of the source string, its encoding type, and its content in relation to the resulting Unicode string.

For example, for 1-byte encodings, such as MacRoman, the Unicode string will be at least double the size (more if it uses noncomposed Unicode); for MacArabic and MacHebrew, the corresponding Unicode string could be up to six times as big. For most 2-byte encodings, for example Shift-JIS, the Unicode string will be less than double the size. For international robustness, your application should allocate a buffer three to four times larger than the source string. If the output Unicode text is actually UTF-8—which could occur beginning with the current release of the Text Encoding Conversion Manager, version 1.2.1—the UTF-8 buffer pointer must be cast to UniCharArrayPtr before it can be passed as the oUnicodeStr parameter. Also, the output buffer length will have a wider range of variation than for UTF-16; for ASCII input, the output will be the same size; for Han input, the output will be twice as big, and so on.

AVAILABILITY

Supported in Carbon. Available in Carbon 1.0.2 and later when running Mac OS 8.1 or later.


© 2000 Apple Computer, Inc. (Last Updated 7/17/2000)