--------------------------------------------------------------------------- Inform: A Compiler of Infocom-Format Games (third edition) --------------------------------------------------------------------------- "I will build myself a copper tower With four ways out and no way in But mine the glory, mine the power..." (Louis MacNeice, "Flight of the Heart") Hello, Informer! This manual contains four elements. Firstly, it documents Inform, a program to manufacture Infocom version-3 format story files, which can then be played on any of the interpreters now widely available on most machines. Inform has now been fairly heavily re-written, and is quite dependable (at least on my machine) even on very large game files. Secondly, it attempts to fully specify the version-3 "Z-machine". Some of this information is already circulating in other files, but uncollated. The rest seems only to be available in as much as it is implicit in the interpreter sources. There seems to be some demand for this, if only among those tinkering with the interpreters. Thirdly, it contains articles on how to design games in general, which is not specific to this format at all. Admittedly I do not always follow my own dictats, but I think that game implementation bears about the same relation to design as typing does to writing poetry, and I didn't want only to talk about the typing. Fourthly, it documents a suite of standard game routines provided with Inform to allow designers to begin coding at once. In effect, this means that an Inform source file need not contain any of the parser code, or the running of the "game universe" - the library consists roughly of a full implementation of Zork without any actual puzzles. It manages rooms, objects, containers, things on top of other things, light, scoring, switching things on and off, opening, closing and locking things, entering things, travelling about in them and so forth: it implements about 80 verbs. The parser it uses (which can be entirely invisible to the designer, or can be altered if necessary) is about as good as anything Infocom ever had. (And its source is very heavily commented, so the algorithm may be of interest even to non-Informers.) To get a quick look at the language, read parts (1) to (12) and Appendix C. --------------------------------------------------------------------------- Prefaces to editions of this manual --------------------------------------------------------------------------- Historically, Inform was not a wonderfully well-written program: it must be admitted that I treated it more as an easel than a painting. But it works, and it runs in only two passes. (This may sound easy but is not, because the story file format requires all manner of tricky operations to be done: for example, the dictionary must be alphabetically sorted, and the code must know absolute addresses of its entries... and the address of the start of the dictionary depends on many other things not known during pass 1... and so on.) It seems also, for what it's worth, to be more efficient than Infocom's own compiler - perhaps because it compiles from a low C-like rather than high LISP-like source. Inform is not public domain in the proper legal sense of the term. The copyright is retained by the author, Graham Nelson. I am perfectly happy for Inform to be used by anybody for any recreational purpose. It may be freely distributed provided no profit is involved, and provided the copyright message is retained. Please do not circulate heavily modified versions, and please comment any private changes of your own at the top of the source code. Story files produced by Inform belong to whoever wrote the source for them; I think, however, it is fair to ask that game-writers put some message into their credits saying that Inform was used, and giving the version number used to compile it. Notes on the second release: Since the first release, much improvement has been made in Inform's memory management which is now quite efficient: it allocates between 50 and 75K of memory, as opposed to 800K in the first edition. The code is in ANSI C, is contained in a single file (without needing non-standard headers) and some effort has been made to improve its portability. Hopefully it doesn't assume an ASCII character set, or 32-bit integers, or any particular byte-orientation within integers. PC versions now ought to be feasible. The code has been annotated to some extent, and contains notes which should be useful to anyone trying to port the code to a new machine. This documentation has changed only in the new "objectloop" construction, and in Appendix C (sample output for the given programs). The language which Inform compiles has not changed (except that two defunct features, which had not in any case been documented, have been withdrawn). Details of changes to the source code of Inform may be found in detailed comments at its head. GAN June 1993 And the third: The third edition of this manual contains a fair amount of new material. Most of the changes in Inform are improvements which do not affect the language (better error reporting, bug fixes, speed improvements, new compile options, much greater portability across different C compilers). However, various new features have been added, mainly to provide for needs which cropped up for the author: new command line switches; see (1) new directives "statusline", "release", "include", "default", "stub"; (2), (18) string indirection via the synonyms table; (5), (16) multiple object names; (11), (16) new constant forms #n$word and #r$routine; (5) new object alteration commands "write" and "give"; (8) new command "string"; (8), (16) new command "font"; (8), (18) abbreviations, properly using the synonyms table. (17) Making use of abbreviations slows compilation down, but (when switched on) can make reasonable memory savings (about 8%). Inform also now correctly works out the checksum and length fields for the story file header, which (a) makes a "verify" command easy to implement, and (b) may make story files work on older interpreters. One change has been made which is incompatible with earlier editions: the names of some of the more esoteric debugging directives have been homogenised. The text has been revised and clarified in several other places. GAN November 1993 The author's email address may be found at the bottom of this file. Comments and bug reports (by email) are welcomed with whatever degree of enthusiasm he can muster. --------------------------------------------------------------------------- Contents --------------------------------------------------------------------------- Inform 1. Command lines and errors 2. Source file format 3. Compiler directives 4. Variables 5. Constants 6. Routines 7. Expressions 8. Commands 9. Conditions 10. Built-in functions 11. Objects 12. Verbs and grammar 13. Exactly what Inform does with words 14. Indirect function calls 15. Text spacing 16. Drastic alteration of objects 17. Abbreviations 18. The status line The Z-machine A1. The Z-machine A2. How text is encoded A3. How Z-code is encoded A4. Using Inform as an assembler Designing games B1. A Bill of Player's Rights B2. What makes a good game? B3. Writing a Parser Example Inform programs C1. A Hello Cruel World program C2. "Deja Vu": a toy game C3. The library routines C4. A shell game to build on --------------------------------------------------------------------------- 1. Command lines and errors --------------------------------------------------------------------------- If Inform is run without any parameters given, it prints out something like the following information: Archimedes Inform 1.0 (v794/at) Release 3 (November 6th 1993) (allocating memory for arrays) (temporary files) This program is a compiler to version-3 Infocom format story files. It is copyright (C) Graham Nelson, 1993. Its syntax is "inform [-list] []" is the name of the Inform source file; Inform translates this into "Zcode." (unless contains a '.' or '/', in which case it is left alone). may optionally be given as the name of the story file to make. If it isn't given, Inform writes to "Zgames." but if it is, then Inform takes as the full filename. -list is an optional list of switch letters following the initial hyphen: a list assembly-level instructions compiled b give statistics after both passes c more concise error messages d contract double spaces after full stops in text e economy mode (slower): make use of declared abbreviations f frequencies mode: show how useful abbreviations are h print this information i ignore default switches set within the file l list all assembly lines m say how much memory has been allocated o print offset addresses p give percentage breakdown of story file s give statistics t trace Z-code assembly w disable warning messages x print # for every 100 lines compiled (in both passes) For example: "inform -dex curses ram:curses". (Don't worry if the version numbers differ slightly on your copy.) Samples of -s and -p output can be found in Appendix C. For -d, see section (15). For -e and -f, see (17). -x is useful if running Inform on a slow machine, since it offers some signs of life. -a, -p and -l are largely useful to assembly-language programmers and the poor unfortunate obliged to maintain Inform. -i overrides any "SWITCHES" directive in the body of the code, so that the only switches applying are those in the command line. -m reveals how many bytes were malloc'ed. The program can be compiled in several different versions, with varying memory needs, and porters might need to use this. The filenames Inform uses for input and output will obviously depend on what machine it runs on. The Archimedes conventions are those above, but, for example, the Unix version compiles files like "dejavu.inf" with headers like "parser.h" to "dejavu.z3". As Inform runs, it may come up with warnings, errors or fatal errors. Here is a typical sample: Archimedes Inform 1.0 (v794/at) line 10: Error: Symbol name expected > global line 12: Error: The "Main" routine is not allowed to have local variables > [ Main i line 12: Warning: Local variable unused: "i" line 23: Warning: Local variable unused: "j" Compiled with 2 errors and 2 warnings (no output) (Infocom interpreters can crash horribly when given incorrect files, so Inform never writes a file which caused an error, though it will allow files causing warnings only.) -c (concise) mode doesn't quote from the source files; -w turns warnings off. The exchange rate is 100 errors to the fatal error: i.e., after 100 errors, Inform gives up altogether. --------------------------------------------------------------------------- 2. Source file format --------------------------------------------------------------------------- Lines in an Inform file are terminated by semicolons. Exclamation marks ! thus... denote that the rest of that physical line is a comment. Backslashes "fold" strings up, so that for example initpos "A hinged trapdoor in the floor stands open, and light streams in \ from below." is treated as if the "f" in "from below." follows directly from where the backslash \ is; i.e., the carriage return and leading spaces are removed. Inform command names are not case sensitive, and nor are variable names. However, reserved words after the initial command (such as the "to" in a "for" construction) must be in lower case. There are seven kinds of line: Examples labels .Label; routines start and stop [ NewRoutine i j; directives #Release 4; assignments fred = parent(lamp); function calls Verify(); compiled commands for i 1 to 10 { Step(i); } assembly language @prop_len_addr lamp lv; In practice, you don't need to know assembly language, but it's there. --------------------------------------------------------------------------- 3. Compiler directives --------------------------------------------------------------------------- Directives are instructions to Inform which do not themselves make code. They can be prefaced by a # character, as in C, but need not be. ABBREVIATE Declare abbreviation (see (17)) ATTRIBUTE Make new attribute flag CONSTANT Declare a constant DICTIONARY Enter in dictionary, and make a new constant for its address END End compilation here (this is optional) GLOBAL [ = ] Make a new global variable; [give it the initial value a] [ string ] [make it point to an (a+1)-byte array, which has as first byte, and is otherwise zeros] [ data ] [make it point to an a-byte array, which is all zeros] [ initial ... ] [make it point to an array, the bytes of which are as given] [ initstr "text" ] [make it point to an array, the bytes of which are the ASCII values of the characters in the string] INCLUDE Include a file in the source OBJECT ... Make an object (see below) PROPERTY ... Make a new property (see below) RELEASE Set the release number to SERIAL Set the serial number to (it must be a six-digit number, and defaults to the date in the form YYMMDD if Inform has access to today's date, or to 930000 if it hasn't) STATUSLINE score Make the status line show score/turns time ...show hours/minutes SWITCHES Declare default switch settings (eg: SWITCHES dexs causes "inform filename" to be read as "inform -dexs filename") VERB ... Enter a line of grammar (see below) The following are mainly for debugging the compiler (should anyone ever get around to doing this) but might sometimes be amusing or helpful: LISTSYMBOLS List the symbol table LISTDICT List the dictionary LISTOBJECTS List the object tree LISTVERBS List the verb table (the names of which have changed from earlier releases), TRACE Trace assembler LTRACE List the lines of input ETRACE Trace expression evaluator BTRACE Trace assembler on both passes NOTRACE, NOLTRACE, etc Turn off appropriate tracing And there are two more rather technical directives: DEFAULT If the constant has not yet been defined, define it with this value STUB If the routine has not yet been defined, define one which has n local variables and always returns false --------------------------------------------------------------------------- 4. Variables --------------------------------------------------------------------------- There are two kinds of variable, global and local (plus one special one). Variables are all two-byte integers, which are treated as signed when it makes sense to do so (eg in asking whether one is positive or not) but not when it isn't (eg when it is used as an address). There can be up to 240 global variables; as indicated in (3), these can be initialised to point to dynamic workspace, so as to achieve the effect of strings and arrays. They have to be declared before use. For instance: Global turns = 1; Global buffer string 120; ! Buffer holding 120 characters Global task_scores initial 4 5 9 1 2 3 0; In any routine, there can be up to 15 local variables. These are declared when the routine begins. (There is one exception: the special Main routine may not have local variables.) There is also a stack, but it should be tampered with only with care. Never call a variable "sp", as this is the stack pointer variable which you might occasionally need to use. The observant reader will have noticed that 240+15+1 = 256. This is of course no coincidence. --------------------------------------------------------------------------- 5. Constants --------------------------------------------------------------------------- Constants may be prefixed with a # character if desired. This can be useful if they are alphabetical and might otherwise be confused with something else. A constant in "double quotes" assembles the given text at a suitable (even) address, and gives half this address as the integer value. Inside this text the character ^ is replaced by a newline character, and the character ~ by a double-quote mark. In practice you seldom need to worry where the text is stored, or how. (New in Release 3) Inside a string, @dd (an @ sign followed by two decimal digits) is compiled to the synonym of that number. When the Z-machine finds this, it prints the string pointed to by that entry in the synonym table. (This is useful in altering object short names - see section (16).) A character in single quotes, such as 'e', means the ASCII value of that character. (This is true even on machines not using ASCII, of course.) A dollar $ indicates that a hexadecimal constant follows; $$ indicates that binary follows. Any constant declared in a directive can be quoted, and so can the special constants adjectives_table preactions_table actions_table (set up by Inform) which give the code address of these tables. A constant beginning a$, followed by the name of a routine which is an action routine, will have as value the number of the action. A constant beginning w$, followed by a word of text, has as value the address of the given word in the dictionary (Inform will give an error at compile time if no such word is there). (New in Release 3) A constant beginning n$, followed by a word of text, has as value the address of the given word in the dictionary (Inform adds it to the dictionary as a new word if it is not already there). (New in Release 3) A constant beginning r$, followed by a routine name, gives (half) the address of the given routine. (This is very helpful when altering properties of objects which are routine addresses.) Thus, for instance, the following are legal constants: 31415 $ff $$1001001 #adjectives_table #a$LookSub #w$invent 'X' "an emerald the size of a plover's egg" "~Hello,~ said Peter.^~Hello, Peter,~ said Jane.^" #r$FireRodRoutine #n$amazon Unfortunately, at present negative constants are not allowed, and Inform will reject, say, #-50. In practice they are only very occasionally needed, and can of course always be got by, say, 0-50. --------------------------------------------------------------------------- 6. Routines --------------------------------------------------------------------------- The syntax to begin a routine is [ RoutineName ... ; and to end it, is ]; l1 to ln are the names of local variables, which are also the call parameters. For example, if you have a routine [ Look i j k; ...some code... ]; and it is called by Look(attic); then i will initially have the value "attic" when this is executed. Any local variables not specified (in this case, j and k) are initially zero. It should be emphasized that it is legal to call Look with 0, 1, 2 or 3 arguments. (Three is the maximum number of arguments any routine can have.) Every routine returns a value to the caller; if no such value is explicitly given, this value is the integer 1 ("true"). In a line like Banner(); the return value is thrown away. Inside a routine, labels may be declared with a line of their own: .labelname; but note that whereas local variables have names which only mean anything locally, labels have names which are global. In other words, you can't have a label called "loop" more than once in the file. (It is legal to jump from one routine to a label inside another one, but extremely dangerous.) There is one special routine, which you must define, called Main. This is where execution of the game will begin, and it _must_ be the first one defined. Returning from Main will cause the interpreter to crash: you should explicitly use the "QUIT" instruction instead. Also, uniquely and for peculiar reasons, Main is _not_ permitted to have any local variables of its own. This means it is usually only used as an outer shell. (Inform issues a warning if the earliest defined routine is not called "Main".) --------------------------------------------------------------------------- 7. Expressions --------------------------------------------------------------------------- The usual arithmetic expressions are allowed, including the operators: = set variable (only) on left equal to value on right + - plus, minus * / % & | times, divide, remainder, bitwise and, bitwise or -> --> byte, word array entry (eg: buffer->4 gives contents of the byte with address buffer+4, while table-->3 gives the word at table+6) In addition one may call a function, either a built-in function or a routine. For example: 4*(x+3/y) i=j-->1 Fish(x)+Fowl(y) [Note: in earlier releases, Inform used to be unable to cope with many complicated expressions used in the same command, such as put buffer+6 byte i+j+1 56*prime(4); ...but now it can.] --------------------------------------------------------------------------- 8. Commands --------------------------------------------------------------------------- The "high level" commands in Inform are as follows: NEW_LINE Print a carriage return SHOW_SCORE Redisplay the score bar immediately, without waiting for the next keyboard input PRINT "text" Print text PRINT_RET "text" Print text, print a newline and return true (1) PRINT_NUM Print a as a (signed) decimal number PRINT_CHAR Print the character whose ASCII value is a PRINT_ADDR Print the string whose address is a PRINT_PADDR Print the string whose address is 2*a PRINT_OBJ Print the short name of object a REMOVE Remove object a from the tree of objects (it may certainly be later put back) MOVE TO Add object a to the things possessed by b WRITE [ ...] Set properties of the given object to the given values. (This is intended to replace the more primitive PUT_PROP: it can accept more complicated expressions, and handle more than one property at a time.) GIVE [...] Give the object the quotes attributes. An attribute beginning with ~ is cleared instead. (This is intended to replace SET_ATTR and CLEAR_ATTR.) Thus, for instance, give lamp light ~open container scored; PUT BYTE Write byte value v into index'th byte after addr PUT WORD ...and similarly for words INC Increment variable DEC Decrement RETURN Return (actually, return true, i.e. 1) RETURN Return the value a RTRUE Return true, i.e. the value 1 RFALSE Return false, i.e. the value 0 (These used to, and still can, be called "ret#true" and "ret#false".) INVERSION Prints (in the game, not at compile time) the version number of Inform used to compile the story file IF If the condition is true, execute the code { ... code ... } (braces are _compulsory_) [else execute the [ ELSE { ... other ...} ] other code instead] WHILE While loop { ... code ... } FOR TO For loop: the final value must be a constant { ... code ... } or another variable. If the range is empty, it does not execute even once. DO Until loop { ... code ... } UNTIL OBJECTLOOP FROM/IN A form of while loop. The var first holds either the obj value (if it is FROM) or its child (if IN), and runs through the sibling objects. So, for instance, objectloop x in lamp { print_obj x; new_line; } is equivalent to x=child(lamp); while x~=0 { print_obj x; new_line; x=sibling(x); } BREAK Break out of current loop (not block) JUMP Reads keyboard into buffer a and decomposes it to the buffer b: on entry, a[0] = buffer size, b[0] similarly on exit a[1] = no chars typed, 2 to a[1]+1 are the chars (unterminated) From byte 2, b contains 4-byte chunks, one for each word of input: address of dictionary entry if recognised, 0000 otherwise number of letters in word first char of word in a This command automatically redisplays the status (score) line. If a command matches none of these, or if it began with an @ character, the line is sent to the assembler instead. Some of the assembler opcodes are fairly usable, but the essential features of the Z-machine can be got at with just the high-level commands and functions. --------------------------------------------------------------------------- 9. Conditions --------------------------------------------------------------------------- These take the form where the relation is one of == a equals b ~= a doesn't equal b < > >= <= comparisons has object a has attribute b at the moment hasnt ...hasnt... near objects a and b have the same parent far ...haven't... These may _not_ be used in expressions (as if the language were C) and there is no AND/OR construction. There is a reason for this, but not a very good one (unless you count laziness). However, one concession towards such a feature is provided, viz. the useful construction == [or [or ]] which is true if the first something is any of the values given. --------------------------------------------------------------------------- 10. Built-in functions --------------------------------------------------------------------------- The built in functions are PARENT(obj) SIBLING(obj) CHILD(obj) for reading the object tree (see (11) below), together with RANDOM(x) which returns a uniformly random number between 1 and x, and PROP_LEN(addr) PROP_ADDR(o,p) PROP(o,p) for which see (11) below. Warning: some interpreters set up their random number generator with poor choices of seed value, which means that the first few random numbers may be rather peculiarly distributed. After a time, it settles down. To get around this, "Curses" (for example) takes and throws away 100 random numbers when it begins. --------------------------------------------------------------------------- 11. Objects --------------------------------------------------------------------------- The object hierarchy is a tree of up to 255 "objects", which you might use for many different game elements: rooms, compass points, scenery, things which can be picked up, and so on. They are numbered from 1 to 255, and the number 0 by convention means "nothing". Attempting to print_obj object 0 will produce a string full of peculiar letters and (if you are very unlucky indeed) even random ASCII values followed by an interpreter crash. In the tree, each object has a parent, a sibling, and a child. Thus, for instance, a portion may resemble Meadow | Mailbox -> Player | | Note Sceptre -> Cucumber -> Torch -> Magic Rod | Battery in which -> shows siblings, and | parents and children. In this case, the Meadow has nothing as its parent. Anything with no possessions, such as the note, has nothing as its child, and so on. When an object is moved, its possessions move with it, of course. In practice an object needs rather more data than just a position in a tree. It also has a collection of variables attached to it. Firstly, there are 32 flags, called "attributes", which can be either set or clear. These might be such conditions as "giving light", "currently worn" or "is one of the featureless white cubes". All 32 are free for the programmer to use (though the Library routines, if in use, consume many of them). They must be declared before use, by directives like ATTRIBUTE locked; which will allocate a new attribute and make a constant "locked" to have the value of its number. You never then need to know about these numbers, because you can use commands like IF obj has locked { print_ret "But it's locked!"; } GIVE obj locked; Warning: 32 sounds like plenty, but the limit can quite easily be hit. The author has found it useful to declare one as "general", to be used for different things for different objects. (And this is done for you in the library.) Secondly, there are 30 "properties". These are far more elaborate. For one thing, not every object has every property. The following all declare new properties: PROPERTY door_to; PROPERTY article "a"; PROPERTY blorpleroutine $ffff; The value given, in the case of article and blorpleroutine, is the default value: that is, the value of the property which an object will have if it doesn't explicitly have some other value. If you don't define a default value, it will by default be 0. So, for instance, PROP(frog,timeleft); will return 0 if "frog" has no timeleft entry. The data for a given property can be a number, or up to four numbers in a row, or up to eight bytes of data. The simplest way to get at the current value is something like i=PROP(location,door_to); which will get the first number in the property door_to of object location. Similarly, it can be written to with WRITE location door_to hall_of_mists; A subtle point is that numbers smaller than 256 are stored differently from larger ones. In order to decide whether the property is one byte's worth or two, the Z-machine looks at the number of bytes which the property has in all, and sees whether it is odd or even; if even, it presumes the number is a 2-byte word; if odd, it presumes it is just one byte. This is seldom something you need to know about, but occasionally you will want a property which will, later in the game, need to hold a value of, say, 1000, but which initially will be zero. This is particularly the case with timing mechanisms, for instance. The command PROPERTY LONG timeleft; declares the property "timeleft" and requires Inform to make sure that all "timeleft" fields are 2 bytes wide, even if they have small initial values. More elaborate manipulation has to be done by hand. k=PROP_ADDR(o,weird); sets k to the address of the "weird" data of object o. To find out how many bytes there are, apply PROP_LEN to this address. l=PROP_LEN(k); Once you have the address you can read and write to it directly. Be careful not to overrun the length, which may not be changed. Warning: the Z-machine crashes if you attempt to write to a property field which an object hasn't got. So although you can read an undeclared property (you just get the default values), you can't write to one. An object is declared (before the body of the code) by something like: OBJECT trapdoor "hinged trapdoor" attic with name "hinged" "trap" "door" "trapdoor", initpos "A hinged trapdoor in the floor stands open, and light \ streams in from below.", closedpos "There is a closed trapdoor in the middle of the floor.", portalto house, postroutine TrapdoorPost, dirprop d_to has portal static open light openable; trapdoor is a constant which is set to its object number; "hinged trapdoor" is its attached short name; attic is the object which initially possesses it. If it was to be initially unowned, this would be "nothing" instead of "attic". After "with" is a list of property definitions, in the form ... [[, ...]] Warning: an excellent source of mysterious errors is missing off the commas between these, since property names are themselves legal constants. There is one special property, called "name". Its data must be (up to four at most) words, as above, and these are entered into the dictionary as nouns (if they aren't already present): the data actually stored is the dictionary addresses. Note that the dictionary itself does _not_ know that "door" refers to this object: there might be any number of objects which could be called "door". After "has" is a list of attributes which the object initially has. (New feature in Release 3) You may give an object more than one internal name, thus: OBJECT frog tree brick "frog" attic with ...; after which the same object can be called frog, tree or brick within the source code: in other words, several constants are created with the same value. Why on earth should you want this? - See section (16). --------------------------------------------------------------------------- 12. Verbs and grammar --------------------------------------------------------------------------- Whereas objects should be declared at the start of the file, the grammar to be allowed by the game should be declared at the end. This is done with the VERB command. VERB does something very complicated, but probably not what you think. A typical VERB command would be: VERB "take" "get" "pick" "lift" * "out" -> ExitSub * multi -> TakeSub * multiinside "from" noun -> RemoveSub * "in" noun -> EnterSub * "off" held -> DisrobeSub; This declares a verb, for which "take", "get" etc are synonyms, and which can take five different courses. In the first, it must be followed by the word "out". In the last, it must be followed by "off" and then an item which is currently held by the player. In the second, it can be followed by one object, or a list, perhaps specified as "everything", for instance. There can be no grammar at all, for example VERB "invent" "i" * -> InvSub; After the "->" is the name of a routine which is to be called when this is matched. For traditional reasons unclear to the author, previous Infocom hackers have called words such as "out" and "off", adjectives. This is monstrously illiterate since they are of course prepositions. We shall wearily follow convention anyway. Remember that the Z-machine does _not_ contain the bulk of a game parser, only the computationally expensive and low-level part which works out what the words are. So this command only sets up a table with some numbers in. If you want a parser, you have to write code to deal with the table again. If you're using the library routines, the parser is all done for you and the possible tokens are: Word What the library parser uses it for ==== =================================== noun any visible object held object held multi one or more visible objects multiheld one or more held objects multiexcept one or more objects, except the other object multiinside one or more objects, inside the other object creature an animate creature special any single word or number Look through the library's grammar table for examples. (New in Release 3) If a verb is declared as a meta-verb, e.g. via VERB meta "score" * -> ScoreSub; then the parser will treat it as outside the game - taking no time up, and possible at any moment. --------------------------------------------------------------------------- 13. Exactly what Inform does with words --------------------------------------------------------------------------- This is a very technical section about exactly how Inform deals with the grammar table and the dictionary. It can safely be ignored by anyone using the library routines supplied, and in fact since the remaining sections of the manual proper are quite specialised, the next part to read is probably Appendix C. By convention, adjectives are numbered downwards from $ff. Thus, if the above were the opening lines of grammar, "from" would be $fe, and so on. As they are created, they are entered into the dictionary, and also into the adjective table, which has four-byte entries 00 ----2 bytes----------------- ----2 bytes----------- In order to make life more interesting, these entries are stored in reverse order (i.e., lowest adjective number first). The address of this table is rather difficult to deduce from the file header information, so the constant #adjectives_table is set up by Inform to refer to it. In any event, the table isn't very useful and is created only for the sake of conforming to Infocom internal conventions. The important tables are the grammar and action tables. The grammar table address is stored in word 7 (ie bytes 14 and 15) of the header. The table consists of a list of two-byte addresses to the entries for each word. This list is immediately followed by these entries, one after another. An entry consists of one byte giving the number of lines (eg, 5 for the "take" definition above) and then that many 8-byte lines. These lines have the form --1 byte- ----6 bytes-------- --1 byte------- is the number of objects which need to be supplied: eg, 0 for "inventory", 1 for "take frog", 2 for "tie rope to dog". The sequence of words gives up to 6 blocks of syntax to follow the verb, which must be matched in order. Large numbers such as $ff mean that the appropriate adjective must appear; small numbers are inserted by special words such as "held" or "noun" in the VERB command: Word Byte ==== ==== noun 0 held 1 multi 2 multiheld 3 multiexcept 4 multiinside 5 creature 6 special 7 The sequence is padded out to 6 bytes with zeros. The action numbers begin at 0. The first routine mentioned as an action (in the above example, ExitSub) is assigned action number 0; the next (TakeSub) is given 1, and so on. The appropriate number is stored in the last byte of the line. Thus, a little later on in the grammar, the line VERB "exit" "leave" * -> ExitSub; might well appear, and ExitSub will mean "action 0" as before. So this table does not store the address of the action routine, as one might expect. Instead the addresses corresponding to the action numbers are stored in the actions table. Once again, Inform puts this table in its conventional place, but this address being difficult to work out, the constant #actions_table is set up to hold it. The actions table is simply a list of 2-byte entries giving the routine addresses (divided by 2). There is also a preactions table, with another constant #preactions_table, created only to conform to Infocom conventions; it is set up containing 0000 for each action. ("Curses", for instance, makes no use of this.) In the mean time, what has happened to the actual words, "take", "get", "pick" and "lift"? Note that these do not appear in the grammar table at all. Instead they are entered into the dictionary, along with the verb number. As a final baroque twist, these numbers also count down from $ff. Any number of words can be given, all referring to the same verb number; "Curses" has 11 synonyms for "attack", for instance. Of course, Inform does not know or care what is done with any of these tables. For instance, the "take" verb has the entry 005 000 255 000 000 000 000 000 000 001 002 000 000 000 000 000 001 002 005 254 000 000 000 000 002 001 253 000 000 000 000 000 003 001 252 001 000 000 000 000 004 but it is up to the code you write to deal with this. (The VERBS command will print out the full verb table in a similar format.) This section describes what Inform does with the dictionary. Again, if you use the parser supplied, you needn't know this. The fourth word of the file header (bytes 8 and 9) contain the dictionary table's address. The table begins with a 7-byte header: 03 '.' ',' '"' meaning there are three characters used to separate words in typed input, full stops, commas and quotation marks. (The Z-machine will allow any list to be given here but Inform decides on this for you.) 07 ----2 bytes-------- meaning there are that many entries in the dictionary, all 7 bytes long. (This could again be in principle varied, but allows for six significant letters in words, while still enabling the text of the word to occupy a 4-byte integer - which is convenient and fast when the compiler is alphabetically sorting.) The seven-byte entries are in alphabetical order, and look like: ----4 bytes----------- --1 b-- ----1 byte--- ----1 byte-------- The text is stored in the usual text format, thus allowing up to 6 characters. The flags (chosen once again to conform loosely to Infocom conventions, not for any sensible reason) have the eight bits 7 6 5 4 3 2 1 0 .. .. .. .. , and mean the word can be a verb, noun or adjective; the bit means the word was inserted by a DICTIONARY command in the program, except that words also have the bit set (ours not to wonder why). (Release 3 of Inform also makes use of bit 1 above to indicate which verbs were declared as "meta": the parser can use this to see how to treat a verb.) Note that a word can be any combination of these at once. It can even be simultaneously a verb, adjective and noun. Typically a full game contains about 600 dictionary entries - about ten times the number of portable objects. Even so it only consumes about 4K, or 1/64th of the available memory. It's never worth economising on dictionary entries; nothing else a designer can do with 4K will be as good to the user. --------------------------------------------------------------------------- 14. Indirect function calls --------------------------------------------------------------------------- Occasionally one needs to call a function whose address is in a variable: for example, if the routine address has been looked up from a table, or an object's property list. For this, the function "indirect" is provided: a=indirect(b); sets a to the return value of calling the function whose address is in b. If you want to pass arguments as well, you should use the assembler-level @icall. But do so with care: it is dangerously easy to leave values lying about on the stack, which will overflow causing a mysterious crash hundreds of turns later. --------------------------------------------------------------------------- 15. Text spacing --------------------------------------------------------------------------- Typewritten English, like this file, normally puts a double space after a full stop. This is much easier to read. Unfortunately Infocom-standard interpreters do not usually understand that. When they fold text across lines, they can easily turn ...and a pomegranate. After all, you always hated fruit. into something which looks like |You decline the offer of a banana, an apple and a pomegranate. | | After all, you always hated fruit. | | | |> | which looks awful. It would be easy to fix the interpreter not to do this; but nobody does. In case (like the author's) your typing is habitually double-spaced, Inform provides a command line option -d to change it back again. It does this only by replacing the string ". " by ". " in text conversion. --------------------------------------------------------------------------- 16. Drastic object alteration --------------------------------------------------------------------------- In earlier versions of Inform, there were some aspects of an object difficult to change, once set. Firstly, the "short name". If you declared an object as, say, OBJECT frog "little green frog" attic WITH ... then the game would always refer to it as "little green frog": this would be impossible to alter if, for instance, it should in some magical way become an enormous green frog. A sneaky way around this is to use string indirection. Declare it as OBJECT frog "@00" attic WITH ... so that, when the Z-machine does a print_obj on it, it prints out the string entered 0th in the synonyms table. In your initialisation code, write your own string here, by: STRING 0 "little green frog"; (which actually compiles simply to: PUT $0042 WORD 0 "little green frog"; $0042 is the address of the synonyms table; setting word n changes the string printed in place of @n.) Then at any time you can amend the name by STRING 0 "enormous, slavering green frog"; @00 to @25 are available. (Counting in decimal, not hex.) (This system also provides an elegant way of dealing with bottles and containers of water in general, say: "full beer bottle" can become "half-empty beer bottle" and then "empty beer bottle". (But if so, remember also to change its indefinite article from "a" to "an".)) Secondly, properties which pointed to game routines were tricky to set for very complicated reasons to do with how constants are translated. Suffice to say that one can now do this by, e.g. WRITE frog preroutine #r$EnormousFrogPre; or WRITE frog preroutine #r$LittleFrogPre; Thirdly, it was difficult to change the dictionary entries recognised as referring to an object. Well, it still is, but here's how it's done: Create your object with the name field containing dummy entries, e.g. OBJECT frog "@00" attic WITH name "zzzzzz" "zzzzzz" "zzzzzz", ...; and then initially set these by x=prop_addr(frog, name); put x word 0 #n$little; put x word 1 #n$green; put x word 2 #n$frog; then alter them by x=prop_addr(frog, name); put x word 0 #n$enormo; (Warning: if there are only three entries in the name property list, as here, then the Z-machine will crash if you try to write to the fourth: so make sure there are enough dummy entries when you create the object.) It is thus possible to change absolutely every aspect of an object. The virtue of this is that, as Richard Tucker pointed out to the author, the limit of 256 objects ceases to be a limitation if you can recycle them. It takes careful coding, but these methods allow that recycling to take place. This is why multiple internal names are now allowed for an object: OBJECT frog brick herring "@00" attic WITH name "zzzzzz" "zzzzzz" "zzzzzz", ...; will allow the same object number to be called frog, brick or herring by different routines which deal with different incarnations of the same object. --------------------------------------------------------------------------- 17. Abbreviations --------------------------------------------------------------------------- When the game becomes full, about 10K out of the 128K length can be saved by making use of text abbreviations: a method under which up to 64 commonly occurring phrases can be abbreviated whenever they occur. This makes no difference to the text as seen by the player. Because checking for these causes a speed overhead, and isn't worthwhile until the game is about 95% full, Inform does not do so except in economy mode (compiling with the switch -e on). An abbreviation must be declared explicitly, before any other text appears, by a directive such as: ABBREVIATE "the "; Only 64 may be declared (note for experts: the remaining 32 slots in the synonyms table are allocated for variable strings, see (16)). (This causes "the " to be stored internally as only 2 text chunks, rather than 4, whenever it occurs: which is very often.) Obviously, to get maximum gain one must make sensible choices. A reasonable start is: "they " "the " "The " "You " "you " "you're " "and " "but " "is " "There " "can't " "of " "to " "with " "are " "that " "have " "which " "hrough " "in " "here " "into " "It " "east" "west" "north" "south" "not " "an " ", " "thing" "door" "she " "he " "above " "below " ".~" ",~ " "***" "if " "If " "it " "might " "could " "up " "down " "back " "out" "What " "Why " plus a few specific to the game. To see how good these abbreviations are, try compiling with the -f (frequencies) option set, which will count the number of times each abbreviation is used, and work out how many bytes it saved. For instance, "the " occurs some 1200 times in "Curses". One soon sees that parts of speech and words like "there" make big savings, but that almost any proper noun makes hardly any difference. --------------------------------------------------------------------------- 18. The Status Line --------------------------------------------------------------------------- One of the most distinctive features of Infocom games in play is the status line, the (usually highlighted) bar across the top of the screen. The interpreter automatically prints the current game location, and either the time or the score and number of turns taken. The status line is redisplayed at least (a) on a SHOW_SCORE command and (b) each time the game asks the player to type something. It may be displayed even more often, at the whim of the interpreter. The place name is the short name of the object whose number is held in global 0 - the earliest global declared in the file. If ever this global holds the value 0, the name displayed will become corrupted and the game may crash. The next two globals are also used. By default, these show the score and number of turns taken so far, usually in the form "4/87". (Interpreters vary in how they print these.) However, if the file contains the directive Statusline time; then they are treated as the time in hours and minutes. For instance, 1 and 32 would come out as "1:32 am". It is up to the program to adjust these variables as time passes, score is gathered, location changes, etc. (Though as usual, the library will take care of some of this work.) Another miscellaneous point: on some machines, text will by default be displayed in a proportional font (i.e. one in which the width of a letter depends on what it is, so for example an i will be narrower than an m). If you want to display a diagram made up of letters, you will have to turn this off, for which the "font" command is provided: font off; print " +---+^ | A |^ +---+^"; font on; for example. (Remember to turn the font back on afterwards!) On a machine not using proportional fonts, these have no effect. --------------------------------------------------------------------------- A1. The Z-machine --------------------------------------------------------------------------- The so-called Z-machine (the imaginary machine for which story files are programs) is quite well-adapted to its task. It maintains a hierarchy of objects and possessions, and does the computationally-intensive part of parsing input itself. That said, it does not contain the bulk of the parser. The parsing tables which some investigators think are part of the Z-machine format, are in fact the same across different Infocom games only because they all contain essentially the same parser code. Thus, Inform is in principle free not to compile such tables, but it does so in order to placate other people's programs (such as INFODUMP). Some tables are put to subtly different uses, however. The following description is fairly complete, but only covers version 3. It would be helpful if someone public-spirited would write an account of the differences in later versions. The version 3 Z-machine is 128K long at most. Addresses within it are nonetheless held in 2-byte words, which is why some addresses are stored as half their actual values, and why some items (routines and static strings) are always stored at even addresses. The first 64 bytes contain a header. The first 4 bytes are: 03 ----2 bytes----- 3 indicates version 3; the release number is as set in the program; the flags byte contains bits: 1 Status line type (clear for Score/Turns, set for Hours:Mins) 3 Censorship bit (used by some games, but not by the Z-machine) 4 Alternative prompts - sometimes used by primitive interpreters 5 Status window support - used only by "Seastalker" Next come seven word addresses, at words 2 to 8: 2 Where routines begin, in bytes 3
Address of main routine, in bytes, +1 (This +1 is why Main cannot have local variables - it is a peculiarity of the standard. Note also that this is uniquely a routine address in bytes and not words: Main must occur in the lower 64K of the file. Inform always sets word 3 to be word 2, plus 1.) 4 The dictionary table address, in bytes 5 Object table address, in bytes 6 Global variables address, in bytes 7 The total number of bytes in a saved game (Saving the game is done by saving this many bytes from the beginning of the machine. (Saved games also contain the current state of the Z-machine stack; the stack is _not_ stored anywhere in the Z-machine's memory.)) 8 This word of flags has bits: 0 Scripting on: send output to printer 1 Disable proportional fonts while this is set 4 Something mysterious to do with sound effects in The Lurking Horror This is followed by the six bytes from byte 18 to 23, which are the version number string. (Inform sets these to the current date, in the form YYMMDD.) Then more words: 12 Synonym table address in bytes 13 Length of file, in words 14 Sum of bytes from 64 upwards, mod $10000 (The length and checksum are not actually used at all by many interpreters, but are set by Inform anyway. Inform also pads out the file to an exact number of 512-byte blocks with zeros, since some interpreters still make use of swapping blocks in and out, virtual-memory style.) The remaining bytes in the header are used by the interpreter and should be left alone by the game code. We are now at $0040 and by convention we reach the synonyms. Usually, the actual strings (the expansions of the synonyms) are stored here, one after another, making up 96 strings. When that is out of the way, the actual table begins (and this is what the synonyms address points to). The table contains 96 two-byte entries, giving the word addresses of the strings before it. (Since Inform only uses synonyms in an unorthodox way, it actually puts a single dull string " " (three spaces) at $0040, and then makes a table of 96 pointers to it, starting at $0042: finally it fills in those abbreviations declared explicitly by the user, into the latter 64 slots, reserving the first 32 for variable usage. Note that Inform does not write the synonym expansions declared by the user at $0040: but then there is no requirement to.) Next is the object table. In fact it begins with what is sometimes called the "global properties table", though it is actually a table of default values of properties. This is a list of 31 2-byte words. There is no property 0, so the first word is always 0000. (Inform also sets the default for property 1 - the special "name" property - to 0000; the remainder are set in property definitions.) After these 62 bytes, the objects begin, beginning from object 1. An object entry consists of 9 bytes, looking like: ---32 bits in 4 bytes--- ---3 bytes------------------ ---2 bytes-- The last three bytes are 00 when the object pointed to is "nothing". The is an address (in bytes) of the properties attached to the given object. When all these 9-byte entries are out of the way, the properties tables begin. (Inform keeps these in the same order as the objects they are attached to.) An individual property table has the brief header 03 --some even number of bytes--- and then lists the properties held, in descending numerical order. (This order is essential.) A property is stored as ---between 1 and 8 bytes-- The size byte is arranged as 32*the number of data bytes, plus the property number. Each list of properties is ended by a 00 size byte. This is why there is no property 0. When all the property tables are done, we come to the global variable table. Global variables are numbered from 0 to 239, and this table begins with 240 initial 2-byte values for them. After this is conventially left space for all the arrays, dynamic strings and so on which they point to. We have now reached the top of the save area. Everything above here is never altered. Next is the table of grammar, which is described as above. It is immediately followed by the actions table, the preactions table and then the adjectives table, also described above. And next the dictionary table, described above. Next is the code area. Not all Infocom games begin with Main, but all Informed ones do. The code area simply contains a list of routines. All routines (and static strings) must occur at even addresses, so as to enable them to have word addresses instead. (Inform occasionally inserts 00 bytes between routines to ensure this.) A routine begins with one byte indicating the number of local variables the routine has (from 0 to 15), and then with that many 2-byte words giving their initial values, if not supplied by the call to the routine. (Inform never makes use of this initialisation, and simply stores 0000's here.) Unlike global variables, these bytes are _not_ used for the current values of the variables: they are kept on the stack. Executable code follows this header. There is no special marker for the end of a routine; it is simply expected that in every case a legal return instruction will be hit. Finally, from the end of the code to the top of memory are the static strings. These are put up here to be out of the way, where they won't clog up the bottom 64K of memory. There's no table of their addresses, or pointer to where they begin; each is referred to by an address in the code or data given earlier. --------------------------------------------------------------------------- A2. How text is encoded --------------------------------------------------------------------------- Text is stored as a sequence of 2-byte words. Each of these is divided into three 5-bit pieces, plus 1 bit left over, arranged as --first byte------- --second byte--- 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 bit --first-- --second--- --third-- The bit is set only on the last 2-byte word of the text, and so marks the end. The pieces are then characters, with values in the range 0 to 31. There are three alphabets, in which the numbers 6 to 31 mean: A0 abcdefghijklmnopqrstuvwxyz A1 ABCDEFGHIJKLMNOPQRSTUVWXYZ A2 ^0123456789.,!?_#'~/\-:() ('^' being actually the new-line character.) Character 0 is a space in all alphabets. Characters 1, 2 and 3 are used for abbreviations: thus, 1 followed by 14 means "print entry 14 in the synonym table"; 2 followed by 5 means "print entry 32+5=37..."; 3 followed by 20 means "print entry 64+20=84..." etc. The Z-machine provides these for commonly occurring strings to be printed out as if they were characters, thus saving memory. Being plainly abbreviations, these are for some reason called "synonyms". (In practice this can save about 10K in a 128K file.) (Inform makes use of these only when instructed to (e.g. @12 is compiled to 1, 12, meaning print the 12th entry in the synonym table), or when abbreviations have been declared (see section (17)).) By default, a character is presumed to be in A0, i.e. to be a lower-case English letter. However, the character 4 means that the next one (only) is in A1; and 5 means the next is in A2. Notice that character 6 in A2 is blank. It isn't a space: it simply isn't there. The sequence 5 followed by 6 indicates that the next two characters define an ASCII value. This is the way to get at the characters not in any of the three alphabets. For example, the familiar message *** You are dead *** takes four "characters" to produce each of the *'s. Finally, note that the end-bit only comes up once every three characters, so that a way is needed to safely use up any spare characters in the last 2-byte block. This is done by padding out with 5's. (5 followed by 5 does nothing.) This is especially the case with dictionary entries. Some dictionary entries, like "i", ought only to take one 2-byte block, but in order to make all entries two 2-byte blocks and alphabetically sortable by number, they are padded out by up to five 5's in a row. (Note that care must be taken to avoid dictionary entries ever containing use of synonyms.) In practice the text compression factor is not really very good: "Curses" contains about 155000 characters of text, stored in 99000 bytes. (Text usually accounts for about 75% of a story file.) But the encoding does at least encrypt the text so that casual browsers can't read it. --------------------------------------------------------------------------- A3. How Z-code is encoded --------------------------------------------------------------------------- The encoding of version 3 Z-code is to say the least complicated. The reader is warned that it is also different to that in all other versions. There are all kinds of exceptions intended either to make small economies of code size (these are very seldom worth the effort, in fact) or to provide new features tacked on at the last minute. Experimenting with Inform as an assembler, while tracing is turned on, may be helpful. Z-code understands four kinds of operand, and describes these in 2-bit fields: $$00 Large constant (>=256) 2 bytes $$01 Small constant (0 to 255) 1 byte $$10 Variable 1 byte $$11 Omitted altogether 0 bytes Variables are described in one byte. 00 means the top of the stack, 01 to $0f are the local variables of the current routine and $10 to $ff are the global variables, 0 to 239. Writing to 00 pushes something onto the stack and reading from it pulls it off. The stack can also be manipulated (with care) using the PUSH, PULL and POP instructions. The stack is guaranteed to be at least 512 bytes long, and some interpreters are more generous. There isn't any way to check stack overflowing, so be careful with recursion. (One of the trickiest problems in compiling Z-code is throwing away unwanted return values of routines which are left on the stack... it can take hundreds of turns before a game crashes if this is got wrong.) Z-code opcodes are 1 byte only. To begin with, look at the top two bits. If these are $$11, we shall call it "variable"; if $$10, "short"; and otherwise "long". In this description, we shall adopt the opcode names used by the existing Infocom disassembler "TXD". For short opcodes, look at the next two bits (4 and 5). These give the kind of operand which the code has. If this is $11, there isn't an operand and the opcode has no argument at all. In this event, the remaining part of the opcode gives what it is: $00 RET#TRUE (1) The opcode is followed by text $01 RET#FALSE in 2-byte chunks as usual $02 PRINT (1) $03 PRINT_RET (1) (2) Opcode followed by a branch $05 SAVE (2) $06 RESTORE (2) (3) This is an abbreviation for $07 RESTARE RET SP, to save one byte $08 RET(SP)+ (3) $09 POP $0A QUIT $0B NEW_LINE $0C SHOW_SCORE $0D VERIFY (2) If the type wasn't $11, then an operand follows, and moreover the "code" part of the opcode means something different: $00 JZ (2) (4) Followed by a store opcode $01 GET_SIBLING (2) (4) (before the branch, if there $02 GET_CHILD (2) (4) is also a branch) $03 GET_PARENT (4) $04 GET_PROP_LEN (4) (5) Refers indirectly to variables $05 INC (5) by their number (Inform $06 DEC (5) suppresses this feature, so $07 PRINT_ADDR "@inc sp" produces the constant 0 instead of variable no. 0 as $09 REMOVE_OBJ operand) $0A PRINT_OBJ $0B RET $0C JUMP $0D PRINT_PADDR $0E LOAD (4) (5) $0F NOT (4) "Long" opcodes have two operands. The bottom 5 bits of the opcode say what it is: $01 JE (2) (6) (6) If this is encoded as $02 JLE (2) "variable", then operands 3 and $03 JGE (2) 4 (if present) are used as a $04 DEC_CHK (2) (5) kind of OR command: eg, $05 INC_CHK (2) (5) branch if o1 = o2, o3 or o4 $06 COMPARE_POBJ (2) $07 TEST (2) $08 OR (4) $09 AND (4) $0A TEST_ATTR (2) $0B SET_ATTR $0C CLEAR_ATTR $0D STORE (5) $0E INSERT_OBJ $0F LOADW (4) $10 LOADB (4) $11 GET_PROP (4) $12 GET_PROP_ADDR (4) $13 GET_NEXT_PROP (4) $14 ADD (4) $15 SUB (4) $16 MUL (4) $17 DIV (4) $18 MOD (4) The alert reader will notice that bits 5 and 6 are left spare to be used. Now there are two operands to specify, which ought to take up 4 bits, which obviously won't fit. So a more economical form is used instead. Bit 6 refers to the first operand, and bit 5 to the second. A value of 0 means a small constant and 1 means a variable. Now, type $11 (not really there) operands can't happen, so that's no problem, but there might well be type $00 (large constant) operands, for example in "@mul x #666 sp". In this event, the opcode is instead programmed as a "variable" opcode. So we must now describe the "variable" opcode form. In addition to the possible opcodes which can arise from overflowing "long" opcodes, there are others which can only be "variable". Here all of the bottom 6 bits are available to describe the opcode, and this either holds the above numbers $00 to $18 or else: $20 CALL (4) (7) These codes are somewhat $21 STOREW conjectural and only apply $22 STOREB to a few Infocom games; Inform $23 PUT_PROP never uses them unless told to $24 READ explicitly $25 PRINT_CHAR $26 PRINT_NUM $27 RANDOM (4) $28 PUSH $29 PULL (5) $2A STATUS_SIZE (7) $2B SET_WINDOW (7) $33 SET_PRINT (7) $34 #RECORD_MODE (7) $35 SOUND (7) Some of these are only of "variable" type because the available codes for the other types had run out - PRINT_CHAR, for instance. Others, especially CALL, need the flexibility to have between 1 and 4 operands. In the "variable" type opcode, all eight bits of the opcode have been used up, so we have to add another byte describing the operands. This is divided into four 2-bit fields. For example, $$00101111 means large constant followed by variable (and no third or fourth opcode). Once the opcode is out of the way, the operands are simply stored in one or two-byte form as appropriate. PRINT and PRINT_RET are followed by text: this is assembled in the usual way immediately after the opcode (which may well be at an odd address, but this doesn't matter) and execution resumes after the last 2-byte chunk of text (the one with top bit set). Opcodes marked as "store" in the above tables, return a value: for example, MUL multiplies its two arguments together, and CALL calls a routine which must return a value. Such instructions are followed by a single byte giving the variable (stack pointer, local or global as usual) to put it in. This may look like an extra operand but is not: there is no need to tell the Z-machine what type it has, since it must be a variable. Finally, there are instructions which test a condition. Apart from the obvious branch instructions (JE and so on), SAVE does this, for example, the test in question being whether or not the save was successful. Branches are stored in two different ways for economy reasons: nearby ones in a single byte at the end of the instruction, farther ones in two bytes. The top bit of the first byte of a branch is the "flag". If this is clear, then a branch occurs when the condition came out false. If it is set, then the branch occurs when it was true. If the next bit (bit 6) is set, then the branch is in abbreviated 1-byte format and the offset is in the bottom 6 bits (0 to 5). If not, the offset is in the bottom 15 bits (0 to 6 of the first byte, and all of the second). This offset can be positive or negative. (Eg., all 1's means -1 in the usual way.) In the abbreviated form, an offset of 1 in fact means "return true from the current routine" and an offset of $20 (i.e., -31) means "return false". An offset of 1 is never useful but -31 might arise, and so it is essential to use the long form for such branches. Working out what the offset ought to be is more complicated than it appears because the PC has already moved on from the start of the instruction when it reaches the branch. The bizarre formula in question is Offset = Destination address - Address of this instruction - Length + B where Length = number of bytes in instruction (not counting the branch) and B is 1 for short branches, 0 for long ones. In practice Inform compiles branches in the long form, considering the economy to be not worth the nightmarish computation needed to make the long/short decision. (One problem is that the number of bytes in each instruction _must_ be the same in both passes, so that the decision needs to be made before the value of the offset is known... in a 2-pass compiler this is insoluble. Another is that the offsets are affected by the size of the branch, confusing things considerably on forward branches.) However, its assembler mode allows you to make an explicit choice. JUMP instructions similarly encode their address operand as an offset, but always as a two-byte (signed) constant. In this respect they differ from CALL instructions. In a CALL, the address is half the absolute routine address. --------------------------------------------------------------------------- A4. Using Inform as an assembler --------------------------------------------------------------------------- Inform can also act as an assembler. A line beginning with an @ character is sent straight to the assembly routines. Constants and variable names can be given as operands but not compound expressions. The following are supported: jump