This technique is similar to the five-bit Baudot code, which was used by early Teletypes before ASCII was invented.Marc S. Blank and S. W. Galley, How to Fit a Large Program Into a Small Machine
--first byte------- --second byte--- 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 bit --first-- --second--- --third--The bit is set only on the last 2-byte word of the text, and so marks the end.
3.2.1
There are three 'alphabets', A0 (lower case), A1 (upper case) and A2 (punctuation) and during printing one of these is current at any given time. Initially A0 is current. The meaning of a Z-character may depend on which alphabet is current.
3.2.2
In Versions 1 and 2, the current alphabet can be any of the three. The Z-characters 2 and 3 are called 'shift' characters and change the alphabet for the next character only. The new alphabet depends on what the current one is:
from A0 from A1 from A2 Z-char 2 A1 A2 A0 Z-char 3 A2 A0 A1Z-characters 4 and 5 permanently change alphabet, according to the same table, and are called 'shift lock' characters.
3.2.3
In Versions 3 and later, the current alphabet is always A0 unless changed for 1 character only: Z-characters 4 and 5 are shift characters. Thus 4 means "the next character is in A1" and 5 means "the next is in A2". There are no shift lock characters.
3.2.4
An indefinite sequence of shift or shift lock characters is legal (but prints nothing).
The remaining Z-characters are translated into ZSCII character codes using the "alphabet table".
The Z-character 0 is printed as a space (ZSCII 32).
In Version 1, Z-character 1 is printed as a new-line (ZSCII 13).
Z-char 6789abcdef0123456789abcdef current -------------------------- A0 abcdefghijklmnopqrstuvwxyz A1 ABCDEFGHIJKLMNOPQRSTUVWXYZ A2 ^0123456789.,!?_#'"/\-:() --------------------------(Character 6 in A2 is printed as a space here, but is not translated using the alphabet table: see S 3.4 above. Character 7 in A2, written here as a circumflex ^, is a new-line.) For example, in alphabet A1 the Z-character 12 is translated as a capital G (ZSCII character code 71).
3.5.4
Version 1 has a slightly different A2 row in its alphabet table (new-line is not needed, making room for the < character):
6789abcdef0123456789abcdef -------------------------- A2 0123456789.,!?_#'"/\<-:() --------------------------
3.5.5
In Versions 5 and later, the interpreter should look at the word at $34 in the header. If this is zero, then the alphabet table drawn out in S 3.5.3 continues in use. Otherwise it is interpreted as the byte address of an alphabet table specific to this story file.
3.5.5.1
Such an alphabet table consists of 78 bytes arranged as 3 blocks of 26 ZSCII values, translating Z-characters 6 to 31 for alphabets A0, A1 and A2. Z-characters 6 and 7 of A2, however, are still translated as escape and newline codes (as above).
0 | null | Output |
1-7 | ---- | |
8 | delete | Input |
9 | tab (V6) | Output |
10 | ---- | |
11 | sentence space (V6) | Output |
12 | ---- | |
13 | newline | Input/Output |
14-26 | ---- | |
27 | escape | Input |
28-31 | ---- | |
32-126 | standard ASCII | Input/Output |
127-128 | ---- | |
129-132 | cursor u/d/l/r | Input |
133-144 | function keys f1 to f12 | Input |
145-154 | keypad 0 to 9 | Input |
155-251 | extra characters | Input/Output |
252 | menu click (V6) | Input |
253 | double-click (V6) | Input |
254 | single-click | Input |
255-1023 | ---- |
The codes 0 to 31 are undefined except as follows:
ZSCII code 8 ("delete") is defined for input only.
ZSCII code 13 ("carriage return") is defined for input and output.
ZSCII code 27 ("escape" or "break") is defined for input only.
0123456789abcdef0123456789abcdef -------------------------------- $20 !"#$%&'()*+,-./0123456789:;<=>? $40 @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_ $60 'abcdefghijklmnopqrstuvwxyz{!}~ --------------------------------Note that code $23 (35 decimal) is a hash mark, not a pound sign. (Code $7c (124 decimal) is a vertical stroke which is shown as ! here for typesetting reasons.)
3.8.3.1
ZSCII codes 127 ("delete" in some forms of ASCII) and 128 are undefined.
3.8.4
ZSCII codes 129 to 154 are defined for input only:
129: cursor up 130: cursor down 131: cursor left 132: cursor right 133: f1 134: f2 .... 144: f12 145: keypad 0 146: keypad 1 .... 154: keypad 9
3.8.5
The block of codes between 155 and 251 are the "extra characters" and are used differently by different story files. Some will need accented Latin characters (such as French E-acute), others unusual punctuation (Spanish question mark), others new alphabets (Cyrillic or Hebrew); still others may want dingbat characters, mathematical or musical symbols, and so on.
3.8.5.1
*** To define which characters are required, the Unicode (or ISO 10646-1) character set is used: characters are specified by unsigned 16-bit codes. These values agree with ISO 8859 Latin-1 in the range 0 to 255, and with ASCII and ZSCII in the range 32 to 126. The Unicode standard leaves a range of values, the Private Use Area, free: however, an Internet group called the ConScript Unicode Registry is organising a standard mapping of invented scripts (such as Klingon, or Tolkien's Elvish) into the Private Use Area, and this should be considered part of the Unicode standard for Z-machine purposes.
3.8.5.2
*** The story file chooses its stock of extra characters with a "Unicode translation table" as follows. Under Versions 1 to 4, the "default table" is always used (see below). In Version 5 or later, if Word 3 of the header extension table is present and non-zero then it is interpreted as the byte address of the Unicode translation table. If Word 3 is absent or zero, the default table is used.
3.8.5.2.1
The table consists of one byte giving a number $N$, followed by $N$ two-byte words.
3.8.5.2.2
This indicates that ZSCII characters 155 to $155+N-1$ are defined for both input and output. (It's possible for $N$ to be zero, leaving the whole range 155 to 251 undefined.)
3.8.5.2.3
The words in the table give Unicode character codes for each of the ZSCII characters 155 to $155+N-1$ in turn.
3.8.5.3
The default table is as shown in Table 1.
3.8.5.4
The defined extra characters are entirely normal ZSCII characters. They can appear in a story file's alphabet table, in an array created by print stream 3 and so on.
3.8.5.4.1
*** The interpreter is required to be able to print representations of every defined Unicode character under $0100 (i.e. of every defined ISO 8859-1 Latin1 character). If no suitable letter forms are available, textual equivalents may be used (such as "ss" in place of German sharp "s").
3.8.5.4.2
Normally, and where sensibly possible, all punctuation and letter characters in ISO 8859-1 Latin1 should be readable from the interpreter's keyboard. (However, some interpreters may want to provide alternative keyboard mappings, or to run in a different ISO 8859 set: Cyrillic, for example.)
3.8.5.4.3
*** An interpreter is not required to have suitable letter-forms for printing Unicode characters $0100 to $FFFF. (It may, if it chooses, allow the user to configure certain fonts for certain Unicode ranges; but this is not required.) If a Unicode character must be printed which an interpreter has no letter-form for, a question mark should be printed instead.
3.8.6
ZSCII codes 252 to 254 are defined for input only:
252: menu click 253: mouse double-click 254: mouse single-clickMenu clicks are available only in Version 6. In Versions 5 and later it is recommended that an interpreter should only send code 254, whether the mouse is clicked once or twice.
3.8.7
ZSCII code 255 is undefined. (This value is needed in the "terminating characters table" as a wildcard, indicating "any Input-only character with code 128 or above." However, it cannot itself be printed or read from the keyboard.)
155 | 0e4 | a-diaeresis | ae | 191 | 0e2 | a-circumflex | a |
156 | 0f6 | o-diaeresis | oe | 192 | 0ea | e-circumflex | e |
157 | 0fc | u-diaeresis | ue | 193 | 0ee | i-circumflex | i |
158 | 0c4 | A-diaeresis | Ae | 194 | 0f4 | o-circumflex | o |
159 | 0d6 | O-diaeresis | Oe | 195 | 0fb | u-circumflex | u |
160 | 0dc | U-diaeresis | Ue | 196 | 0c2 | A-circumflex | A |
161 | 0df | sz-ligature | ss | 197 | 0ca | E-circumflex | E |
162 | 0bb | quotation | >> or " | 198 | 0ce | I-circumflex | I |
163 | 0ab | marks | << or " | 199 | 0d4 | O-circumflex | O |
164 | 0eb | e-diaeresis | e | 200 | 0db | U-circumflex | U |
165 | 0ef | i-diaeresis | i | 201 | 0e5 | a-ring | a |
166 | 0ff | y-diaeresis | y | 202 | 0c5 | A-ring | A |
167 | 0cb | E-diaeresis | E | 203 | 0f8 | o-slash | o |
168 | 0cf | I-diaeresis | I | 204 | 0d8 | O-slash | O |
169 | 0e1 | a-acute | a | 205 | 0e3 | a-tilde | a |
170 | 0e9 | e-acute | e | 206 | 0f1 | n-tilde | n |
171 | 0ed | i-acute | i | 207 | 0f5 | o-tilde | o |
172 | 0f3 | o-acute | o | 208 | 0c3 | A-tilde | A |
173 | 0fa | u-acute | u | 209 | 0d1 | N-tilde | N |
174 | 0fd | y-acute | y | 210 | 0d5 | O-tilde | O |
175 | 0c1 | A-acute | A | 211 | 0e6 | ae-ligature | ae |
176 | 0c9 | E-acute | E | 212 | 0c6 | AE-ligature | AE |
177 | 0cd | I-acute | I | 213 | 0e7 | c-cedilla | c |
178 | 0d3 | O-acute | O | 214 | 0c7 | C-cedilla | C |
179 | 0da | U-acute | U | 215 | 0fe | Icelandic thorn | th |
180 | 0dd | Y-acute | Y | 216 | 0f0 | Icelandic eth | th |
181 | 0e0 | a-grave | a | 217 | 0de | Icelandic Thorn | Th |
182 | 0e8 | e-grave | e | 218 | 0d0 | Icelandic Eth | Th |
183 | 0ec | i-grave | i | 219 | 0a3 | pound symbol | L |
184 | 0f2 | o-grave | o | 220 | 153 | oe-ligature | oe |
185 | 0f9 | u-grave | u | 221 | 152 | OE-ligature | OE |
186 | 0c0 | A-grave | A | 222 | 0a1 | inverted ! | ! |
187 | 0c8 | E-grave | E | 223 | 0bf | inverted ? | ? |
188 | 0cc | I-grave | I | ||||
189 | 0d2 | O-grave | O | ||||
190 | 0d9 | U-grave | U | N = 69 |
The German translation of 'Zork I' uses an alphabet table to make accented letters (from the standard extra characters set) efficient in dictionary words. In Version 6, 'Shogun' also uses an alphabet table.
Unicode translation tables are new in Standard 1.0: in Standard 0.2, the extra characters were always mapped using the default Unicode translation table.
Note that if a random stretch of memory is accidentally printed as a string (due to an error in the story file), illegal ZSCII codes may well be printed using the 4-Z-character escape sequence. It's helpful for interpreters to filter out any such illegal codes so that the resulting on-screen mess will not cause trouble for the terminal (e.g. by causing the interpreter to print ASCII 12, clear screen, or 7, bell sound).
The continental European quotation marks << and >> should have spacing which looks sensible either in French style <<Merci!>> or in German style >>Danke!<<.
Ideally, an interpreter should be able to read time delays (for timed input) from stream 1 (i.e., from a script file). See the remarks in S 7.
The 'Beyond Zork' story file is capable of receiving both mouse-click codes (253 and 254), listing both in its terminating characters table and treating them equally.
The extant Infocom games in Versions 4 and 5 use the control characters 1 to 31 only as follows: they all accept 10 or 13 as equivalent, except that 'Bureaucracy' will only accept 13. 'Bureaucracy' needs either 127 or 8 to be a delete code. No other codes are used.
Curiously, 'Nord 'n' Bert Couldn't Make Head Nor Tail Of It' and 'A Mind Forever Voyaging' allow some letter characters to be typed in with the top bit set. That is, if reading an A, they would recognise 65 or 91 (upper or lower case) and also 193 or 219. Matthew Russotto suggests this was an accommodation for the Apple II, whose keyboard primitives returned the last key pressed in the bottom 7 bits of a byte, plus a top bit flag indicating whether or not the keyboard had been hit since last time.
Section 1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9 / 10 / 11 / 12 / 13 / 14 / 15 / 16
Appendix A / B / C / D / E / F