home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
OS/2 Shareware BBS: 10 Tools
/
10-Tools.zip
/
lifeos2.zip
/
LIFE-1.02
/
TOOLS
/
TOKENIZE.DOC
< prev
next >
Wrap
Text File
|
1996-06-04
|
3KB
|
94 lines
# $Id: Tokenizer.doc,v 1.2 1994/12/08 23:48:19 duchier Exp $
TOKENIZER FOR LIFE
NAME
tokenizer: a complete tokenizer for Life programs.
USAGE
tokenize(filename) ?
reads in the file filename and writes the obtained tokens in the file
filename_toks.
FILES
The file tokenizer.lf contains the tokenizer.
The other files are:
- accumulators.lf, std_expander.lf and acc_declarations.lf
All these files are automatically loaded if they are in the same directory.
They must be loaded with expand_load(true).
DESCRIPTION
Characters belong to one of the following categories:
- void characters: space, nl, tab
- syntactic characters: {}[]()?
- atom characters: any uppercase or lowcase letter, underscore
- operator characters: ~ ` ! # $ ^ & * - + = : | > < / \
- delimiters: " '
- special characters: @ , ; .
The tokens are of the following types:
- variable(X) where X is the name of the variable (an atom);
A variable is any sequence of atom characters, beginning with an
uppercase letter or with underscore (in the latter case, the sequence
must contain at least two characters), possibly terminated by primes;
- construct(X) represents a constructor X.
The type of a constructor is a subsort of construct: numb, chaine, or
atom. X is the "value" of the atom (string, number, or unevaluated atom)
- a number is of the form 123 or 123.56 or 123.56e12 or 123.56e-12
- a string is any sequence of characters delimited by " . Any " occurring
inside a string have to be doubled;
- an atom is any sequence of characters delimited by ' ( any ' occurring
inside such an atom have to be doubled), or any sequence of atom
characters starting with a lowcase letter, or @, or _, or any sequence
of operator characters, or in some cases the dot (returned as
atom(".")).
- any syntactic object (returned as the string "[" or "}") or in some cases
the dot (returned as ".") or "[|", "|]", "[|]".
The dot may be tokenized in three different ways, depending on the context in
which it appears:
- It is not returned as a token if it occurs inside a floating point
number;
- It is returned as a syntactic object "." if it is followed by a void
character (tab, nl, space, or end_of_file)
- it is returned as atom(".") otherwise.
This tokenizer allows the user to define his own syntactic objects, using the
query syntact_object(X), where X is an atom, but not a string, nor a number.
In this case, the tokenizer returns the string "X", and not atom(X).
Comments:
- All the characters between % and the end of the line where it appears are
ignored.
- This tokenizer also recognises nested comments (using /* and */ as
delimiters). * and / can still be freely used inside sequences of operator
characters.
Extensions w.r.t. wild_Life:
- primes at the end of atoms and variables.
- nested comments
- syntact_object declaration
- dots tokenizing
- special tokens for lists
AUTHOR
Bruno Dumant
Copyright 1992 Digital Equipment Corporation
All Rights Reserved