ss1 = 0x8e ss2 = 0x8fThese are used to identify codesets within a string. The EUC encoding scheme uses the following patterns to indicate which codeset is in use at any given time:
Codeset #0: 0xxxxxxx Codeset #1: 1xxxxxxx [ 1xxxxxxx ...] Codeset #2: ss1 1xxxxxxx [ 1xxxxxxx ...] Codeset #3: ss2 1xxxxxxx [ 1xxxxxxx ...]So if ss1 appears in a string, it means that the next character--however many bytes long it is--should be interpreted as a character from codeset #2. If there are multiple characters in a row from codeset #2, each one is preceded by ss1. Similarly, ss2 indicates that the following character belongs to codeset #3. If any other byte whose high bit is 1 appears in the string (without being preceded by ss1 or ss2), it is interpreted as all or part of a character from codeset #1.
In EUC, codeset #1 is always ASCII. The other codesets are implementation- or user-defined. This is why EUC cannot support Latin 1 in Asian locales.
EUC implementations exist (but are not standardized) for all ideographic Asian languages.