OS/2 Shareware BBS: 10 Tools

home *** CD-ROM | disk | FTP | other *** search

/ OS/2 Shareware BBS: 10 Tools / 10-Tools.zip / lifeos2.zip / LIFE-1.02 / TOOLS / TOKENIZE.DOC < prev next >

Wrap

Text File | 1996-06-04 | 3KB | 94 lines

# $Id: Tokenizer.doc,v 1.2 1994/12/08 23:48:19 duchier Exp $ TOKENIZER FOR LIFE NAME tokenizer: a complete tokenizer for Life programs. USAGE tokenize(filename) ? reads in the file filename and writes the obtained tokens in the file filename_toks. FILES The file tokenizer.lf contains the tokenizer. The other files are: - accumulators.lf, std_expander.lf and acc_declarations.lf All these files are automatically loaded if they are in the same directory. They must be loaded with expand_load(true). DESCRIPTION Characters belong to one of the following categories: - void characters: space, nl, tab - syntactic characters: {}[]()? - atom characters: any uppercase or lowcase letter, underscore - operator characters: ~ ` ! # $ ^ & * - + = : | > < / \ - delimiters: " ' - special characters: @ , ; . The tokens are of the following types: - variable(X) where X is the name of the variable (an atom); A variable is any sequence of atom characters, beginning with an uppercase letter or with underscore (in the latter case, the sequence must contain at least two characters), possibly terminated by primes; - construct(X) represents a constructor X. The type of a constructor is a subsort of construct: numb, chaine, or atom. X is the "value" of the atom (string, number, or unevaluated atom) - a number is of the form 123 or 123.56 or 123.56e12 or 123.56e-12 - a string is any sequence of characters delimited by " . Any " occurring inside a string have to be doubled; - an atom is any sequence of characters delimited by ' ( any ' occurring inside such an atom have to be doubled), or any sequence of atom characters starting with a lowcase letter, or @, or _, or any sequence of operator characters, or in some cases the dot (returned as atom(".")). - any syntactic object (returned as the string "[" or "}") or in some cases the dot (returned as ".") or "[|", "|]", "[|]". The dot may be tokenized in three different ways, depending on the context in which it appears: - It is not returned as a token if it occurs inside a floating point number; - It is returned as a syntactic object "." if it is followed by a void character (tab, nl, space, or end_of_file) - it is returned as atom(".") otherwise. This tokenizer allows the user to define his own syntactic objects, using the query syntact_object(X), where X is an atom, but not a string, nor a number. In this case, the tokenizer returns the string "X", and not atom(X). Comments: - All the characters between % and the end of the line where it appears are ignored. - This tokenizer also recognises nested comments (using /* and */ as delimiters). * and / can still be freely used inside sequences of operator characters. Extensions w.r.t. wild_Life: - primes at the end of atoms and variables. - nested comments - syntact_object declaration - dots tokenizing - special tokens for lists AUTHOR Bruno Dumant Copyright 1992 Digital Equipment Corporation All Rights Reserved