Carbon


TECSniffTextEncoding

Header: TextEncodingConverter.h Carbon status: Supported

Analyzes a text stream and returns the probable encodings in a ranked list, based on an array of possible encodings you supply. It also returns the number of errors and features for each encoding.

OSStatus TECSniffTextEncoding (
    TECSnifferObjectRef encodingSniffer, 
    TextPtr inputBuffer, 
    ByteCount inputBufferLength, 
    TextEncoding testEncodings[], 
    ItemCount numTextEncodings, 
    ItemCount numErrsArray[], 
    ItemCount maxErrs, 
    ItemCount numFeaturesArray[], 
    ItemCount maxFeatures
);
Parameter descriptions
encodingSniffer

A pointer to a sniffer object.

inputBuffer

The text to be sniffed.

inputBufferLength

The length of the input buffer.

testEncodings

An array of text encoding specifications. You must fill the array with the text encodings for which you want to sniff. On output, the array elements are reordered from the most likely to the least likely text encodings.

numTextEncodings

The number of entries in the testEncodings[] parameter.

numErrsArray

An array that must contain at least numTextEncodings elements. On return, an array of the number of errors found for each possible text encoding. The array elements are in the same order as the testEncodings[] array elements at output.

maxErrs

The maximum number of errors a sniffer can encounter. The sniffer stops looking for an encoding after this number is reached.

numFeaturesArray

An array of that must contain at least numTextEncodings elements. On return, an array of the number of features found for each possible text encoding. The array elements are in the same order as the testEncodings[] array elements at output.

maxFeatures

The maximum number of features a sniffer can encounter. The sniffer stops looking for a features after this number is reached.

function result

A result code.

DISCUSSION

An error indicates a code point or sequence that is illegal in the specified encoding. A feature indicates the presence of a sequence that is characteristic of that encoding.

For example, the byte sequence which is interpreted in Mac OS Roman as “ä$#248;é$@246;” could legally be interpreted either as Mac OS Roman text or as Mac OS Japanese text. Both sniffers would return zero errors, but the Mac OS Japanese sniffer would also return two features of Mac OS Japanese (representing two legal 2-byte characters.)

The arrays are returned in a ranked list with the most likely text encodings first. The results are sorted first by number of errors (fewest to most), then by number of features (most to fewest), and then by the original order in the list. On return, the most likely encoding is in testEncodings[0] or testEncodings[1].

If an encoding is not examined, its number of errors and features are set to 0xFFFFFFFF, and the encoding is sorted to the end of the list.

AVAILABILITY

Supported in Carbon. Available in Carbon 1.0.2 and later when running Mac OS 8.1 or later.


© 2000 Apple Computer, Inc. (Last Updated 7/17/2000)