home *** CD-ROM | disk | FTP | other *** search
Text File | 1990-07-19 | 59.2 KB | 1,963 lines |
-
-
-
-
-
-
-
-
-
-
-
-
-
- Variant Translators for Version 8 of
- Icon*
-
-
- Ralph E. Griswold
- Kenneth Walker
-
-
-
- TR 90-4a
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- January 1, 1990; last revised February 17, 1990
-
-
- Department of Computer Science
-
- The University of Arizona
-
- Tucson, Arizona 85721
-
-
-
-
- *This work was supported by the National Science Foundation under
- Grants CCR-8713690 and CCR-8901573.
-
-
-
-
-
-
-
-
-
-
-
-
-
- Variant Translators for Version 8 of
- Icon
-
-
-
-
- 1.__Introduction
-
- A preprocessor, which translates text from source language A
- to source language B,
-
- A -> B
-
- is a popular and effective means of implementing A, given an
- implementation of B. B is referred to as the target language.
- Ratfor [1] is perhaps the best known and most widely used example
- of this technique, although there are many others.
-
- In some cases A is a variant of B. An example is Cg [2], a
- variant of C that includes a generator facility similar to Icon's
- [3]. Cg consists of C and some additional syntax that a prepro-
- cessor translates into standard C. A run-time system provides the
- necessary semantic support for generators. Note that the Cg
- preprocessor is a source-to-source translator:
-
- Cg -> C
-
- where Cg differs from C only in the addition of a few syntactic
- constructs. This can be viewed as an instance of a more general
- paradigm:
-
- A+ -> A
-
-
- The term ``translator'' is used here in the general sense, and
- includes both source-to-source translators, such as preproces-
- sors, and source-to-object translators, such as compilers. In
- practice, the application of a source-to-source translator
- (preprocessor) may be followed by the application of a source-
- to-object translator (compiler). The combination is, of course,
- also a translator.
-
- The term ``variant translator'' is used here to refer to a
- translator that differs in its action, in some respect, from a
- standard one for a language. The applications described in this
- report relate to source-to-source translators, although the term
- ``preprocessor'' is too restrictive to describe all of them.
-
- There are many uses for variant translators. Some of them are:
-
-
-
-
-
- - 1 -
-
-
-
-
-
-
-
-
- + the addition of syntactic constructions to produce a
- superset of a language, as in the case of Cg
-
- + the deletion of features in order to subset a language
-
- + the translation of one source language into another [4]
-
- + the addition of monitoring code, written in the target
- language
-
- + the insertion of termination code to output monitoring
- data
-
- + the insertion of initialization code to incorporate addi-
- tional run-time facilities
-
- + the insertion of code for debugging and checking purposes
- [5,6]
-
- Note that in several cases, the translations can be characterized
- by
-
- A -> A
-
- The input text and the output text may be different, but they are
- both in A. Both the input and the output of the variant transla-
- tor can be processed by a standard translator for the target
- language A.
-
- One way to implement a variant translator is to modify a stan-
- dard source-to-object translator, avoiding the preprocessor. This
- approach may or may not be easy, depending on the translator. In
- general, it involves modifying the code generator, which often is
- tricky and error prone. Furthermore, if the variant is an experi-
- ment, the effort involved may be prohibitive.
-
- The standard way to produce a variant translator is the one
- that is most often used for preprocessors in general, including
- ones that do not fit the variant translator paradigm - writing a
- stand-alone program in any convenient language. In the case of
- Ratfor, the preprocessor is written in Ratfor, providing the
- advantages of bootstrapping.
-
- This approach presents several problems. In the first place,
- writing a complete, efficient, and correct preprocessor is a sub-
- stantial undertaking. In experimental work, this effort may be
- unwarranted, and it is common to write the preprocessor in a
- high-level language, handling only the variant portion of the
- syntax, leaving the detection of errors to the final translator.
- Such preprocessors have the virtue of being easy to produce, but
- they often are slow, frequently unfaithful to the source
- language, and the failure to parse the input language completely
- may lead to mysterious results when errors are detected, out of
- context, by the final translator.
-
-
-
- - 2 -
-
-
-
-
-
-
-
-
- Modern tools such as Lex [7] and Yacc [8], that operate on
- grammatical specifications, have made the production of compilers
- (and hence translators in general) comparatively easy and have
- removed many of the sources of error that are commonly found in
- hand-tailored translators. Nonetheless, the construction of a
- translator for a large and complicated language is still a sub-
- stantial undertaking.
-
- If, however, a translator already exists for a language that
- is based on the use of such tools, it may be easy to produce a
- variant translator that is efficient and demonstrably correct by
- modifying grammatical specifications. The key is the use of
- these tools to produce a source-to-source translator, rather than
- producing a source-to-object translator. This technique was used
- in Cg. An existing Yacc specification for the C compiler was
- modified to generate C source code instead of object code. The
- idea is a simple one, but it has considerable utility and can be
- applied to a wide range of situations.
-
- This report describes a system that uses this approach for the
- construction of variant translators for Icon. This system
- presently runs only under UNIX systems with a large amount of
- memory. The reader should have a general knowledge of Icon,
- Yacc, C, and UNIX.
-
-
- 2.__Overview_of_Variant_Translators_for_Icon
-
- The heart of the system for constructing variant translators
- for Icon consists of an ``identity translator''. The output of
- this identity translator differs from its input only in the
- arrangement of nonsemantic ``white space'' and in the insertion
- of semicolons between expressions, which are optional in some
- places in Icon programs.
-
- The identity translator uses the same Yacc grammar as the reg-
- ular Icon translator, but uses different semantic actions. These
- semantic actions are cast as macro definitions in the grammar,
- which are expanded before the grammar is translated by Yacc into
- a parser. One set of macros is supplied for the regular Icon
- translator and another set is supplied for the identity transla-
- tor. The macros used by the identity translator echo the input
- text, producing source-code output. In addition to the grammar,
- other code is shared between the two translators, insuring a high
- degree of consistency between the two systems.
-
- A variant translator is created by first creating an identity
- translator and then modifying it. There is a shell script for
- producing identity translators and associated support software to
- simply the process of making modifications. This support
- software allows macro definitions to be changed via specification
- files, minimizing the clerical work needed to vary the format of
- the output. There also is a provision for including user func-
- tions in the parser, so that more complicated operations can be
-
-
-
- - 3 -
-
-
-
-
-
-
-
-
- written in C. Finally, the grammar for the identity translator
- can be modified in order to make structural changes in the syn-
- tax.
-
- The following sections describe this system in more detail and
- include a number of examples of its use.
-
-
- 3.__The_Grammar_for_the_Icon_Identity_Translator
-
- The Icon grammar is listed in Appendix A. Many variant trans-
- lators can be constructed without modifying this grammar, and
- minor modifications can be made to it without a detailed
- knowledge of its structure. Knowledge of a few aspects of this
- grammar are important, however, to understanding the translation
- process.
-
- There are two types of semantic actions. The semantic action
- for a declaration outputs text. The semantic action for a com-
- ponent of a declaration, such as an identifier list or an expres-
- sion, assigns a string to the Yacc attribute for the component.
- Declarations are parsed by the production:
-
- decl : record {Recdcl($1);} ;
- | proc {Procdcl($1);} ;
- | global {Globdcl($1);} ;
- | link {Linkdcl($1);} ;
-
- The non-terminals record, proc, global, and link each produce a
- string and the corresponding macro Recdcl, Procdcl, Globdcl, or
- Linkdcl prints the string.
-
- Because the grammar is used for both the regular Icon transla-
- tor and the variant translator system, the macro calls must be
- more general than what is required for either one alone. Consider
- the production for global:
-
- global : GLOBAL {Global0($1);} idlist {Global1($1, $2, $3);} ;
-
- The macro Global0 is needed in the regular translator, but per-
- forms no operation in the identity translator. The macro Global1
- does the work in the identity translator; it concatenates "global
- " with the string produced by idlist, and this new string becomes
- the result of this production. The macro Global1 is passed $1,
- $2, and $3 even though it only uses $3. This is done for general-
- ity.
-
- The rules and the definitions that construct and output
- strings are provided as part of the identity translator. When a
- variant translator is constructed, changes are necessary only in
- situations in which the input is not to be echoed in the output.
-
- Deletions from the standard syntax can be accomplished by
- changing macro definitions to produce error messages instead of
-
-
-
- - 4 -
-
-
-
-
-
-
-
-
- output text. It is generally better, however, to delete rules
- from the grammar, so that all syntactic errors in the input are
- handled in the same way, by Yacc.
-
- Modifications and additions to the standard grammar require a
- more thorough understanding of the structure of the grammar.
-
-
- 4.__Macro_Definitions
-
- The purpose of using macro calls in the semantic actions of
- the grammar is to separate the structure of the grammar from the
- format of the output and to allow the output format to be speci-
- fied without modification of the grammar.
-
- The macro definitions for declarations are all the same. For
- example the definition of Global for the identity translator is:
-
- #define Globdcl(x)if (!nocode) treeprt(x); treeinit()
-
- The variable nocode is set when an error is detected during pars-
- ing. This helps prevent the variant translator from generating a
- program with syntax errors. The reason for doing this is that the
- output of a variant translator is usually piped directly into the
- regular Icon translator. If syntax errors were propagated, two
- error messages would result: one from the variant translator and
- one from the Icon translator. The message from the variant trans-
- lator is the one that is wanted because it references the line
- number of the original source whereas the message from the Icon
- translator references the line number of the generated source.
-
- The function treeprt prints a string and the function treeinit
- reclaims storage. See the Section 5 for details of string
- representation.
-
- 4.1__Specifications_for_Macros
-
- The macro definitions for expressions produce strings, gen-
- erally resulting from the concatenation of strings produced by
- other rules. In order to simplify the definition of macros, a
- specification format is provided. Specifications are processed
- by a program that produces the actual definitions. For example,
- the macro While1 is used in the rule
-
- WHILE expr DO expr {While1($1,$2,$3,$4);} ;
-
- A specification for this macro to produce an identity translation
- is:
-
- While1(w,x,y,z) "while " x " do " z
-
- Tabs separate the components of the specification. The first com-
- ponent is the prototype for the macro call, which may include
- optional arguments enclosed in parentheses as illustrated by the
-
-
-
- - 5 -
-
-
-
-
-
-
-
-
- example above. The remaining components are the strings to be
- concatenated with the result being assigned to the Yacc pseudo-
- variable $$.
-
- Specification lines that begin with # or which are empty are
- treated as comments. A set of lines delineated by %{ and %} are
- copied unchanged. The ``braces'' %{ and %} must each occur alone
- on a separate line; these two delimiting lines are not copied.
- This feature allows the inclusion of actual macro definitions, as
- opposed to specifications, and the inclusion of C definitions.
- The standard macro definitions supplied for the identity transla-
- tor include examples of these features. These definitions are
- listed in Appendix B.
-
- Definitions can be changed by modifying the standard ones or
- by adding new definitions. In the case of duplicate definitions,
- the last one holds. Definitions can be provided in several
- files, so variant definitions can be provided in a separate file
- that is processed after the standard definitions. See Sec. 8.
-
- Definitions can be deleted by providing a specification that
- consists only of a prototype for the call. For example, the
- specification
-
- While1()
-
- deletes the definition for While1. This is a convenient way to
- insure a macro is undefined. It is usually used along with the
- copy feature to introduce macro definitions that cannot be gen-
- erated by the specification system. For example, the following
- specifications eliminate reclamation of storage, preserving
- strings between declarations.
-
- Globdcl()
- Linkdcl()
- Procdcl()
- Recdcl()
- %{
- #define Globdcl(x)if (!nocode) treeprt(x);
- #define Linkdcl(x)if (!nocode) treeprt(x);
- #define Procdcl(x)if (!nocode) treeprt(x);
- #define Recdcl(x)if (!nocode) treeprt(x);
- %}
-
-
- 4.2__Macros_for_Icon_Operators
-
- As shown in Appendix A, there is a distinct macro name for
- each Icon operator. For example, Blim(x,y,z) is the macro for a
- limitation expression,
-
- expr1 \ expr2
-
- Note that the parameter y is the operator symbol itself. To
-
-
-
- - 6 -
-
-
-
-
-
-
-
-
- avoid having to know the names of the macros for the operators,
- specifications allow the use of operator symbols in prototypes.
- The symbols are automatically replaced by the appropriate names.
- Thus
-
- \(x,y,z)
-
- can be used in a specification in place of
-
- Blim(x,y,z)
-
- Unary operators are similar. For example, Uqmark(x,y), which is
- the macro for ?expr, can be specified as ?(x,y). In this case the
- parameter x is the operator symbol.
-
- In most cases, all operators of the same kind are translated
- in the same way. Since Icon has many operators, a generic form of
- specification is provided to allow the definition of all opera-
- tors in a category to be given by a single specification. In a
- specification, a string of the form <type> indicates a category
- of operators. The categories are:
-
- <uop> unary operators, except as follows
- <ucs> control structures in unary operator format
- <bop> binary operators, except as follows
- <aop> assignment operators
- <bcs> control structures in binary operator format
-
- The category <ucs> consists only of |. The category <bcs> con-
- sists of ?, |, !, and \.
-
- For example, the specification for binary operators for iden-
- tity translations is
-
- <bop>(x,y,z) x " <bop> " z
-
- This specification results in the definition for every binary
- operator: +(x,y,z), -(x,y,z), and so on. In such a specification,
- every occurrence of <bop> is replaced by the corresponding opera-
- tor symbol. Note that blanks are necessary to separate the binary
- operator from its operands. Otherwise,
-
- i * *s
-
- would be translated into
-
- i**s
-
- which is equivalent to
-
- i ** s
-
-
- The division of operators into categories is based on their
-
-
-
- - 7 -
-
-
-
-
-
-
-
-
- semantic properties. For example, a preprocessor may translate
- all unary operators in the same way, but translate the repeated
- alternation control structure into a programmer-defined control
- operation [9].
-
-
- 5.__String_Handling
-
- Strings are represented as binary trees in which the leaves
- contain pointers to C strings. The building of these trees can be
- thought of as doing string concatenation using lazy evaluation.
- The concatenation operation just creates a new root node with its
- two operands as subtrees. The real concatenation is only done
- when the strings are written out. Another view of this is that
- concatenation builds a list of strings with the list implemented
- as a binary tree. This view allows ``strings'' to be treated as a
- list of tokens. This approach is useful in more complicated
- situations where there is a need to distinguish more than just
- syntactic structures. For example, the head of the main procedure
- can be distinguished from the heads of other procedures by look-
- ing at the second string in the list for the procedure declara-
- tion.
-
- Strings come from three sources during translation: strings
- produced by the lexical analyzer, literal strings, and strings
- produced by semantic actions. The lexical analyzer produces
- nodes. The cases where the nodes that are produced by the lexi-
- cal analyzer are of interest occur where strings are recognized
- for identifiers and literals - the tokens IDENT, STRINGLIT,
- INTLIT, REALIT, and CSETLIT. These nodes contain pointers to the
- strings recognized. (The actual strings are stored in a string
- space and remain there throughout execution of the translator.)
- These nodes can be used directly as a tree (of one node) of
- strings. Other nodes produced by the lexical analyzer, for exam-
- ple those for operators, do not contain strings. However, all of
- these nodes contain line and column numbers referring to the
- location of the token in the source text. This line and column
- information can be useful in variant translators that need to
- produce output that contains position information from the input.
-
- A literal string must be coerced into a tree of one node.
- This is done with the C function
-
- q(s)
-
- This is handled automatically when macros are produced from
- specifications. For example, the specification
-
- Fail(x) "fail"
-
- is translated into the macro
-
- #define Fail(x) $$ = q("fail")
-
-
-
-
- - 8 -
-
-
-
-
-
-
-
-
- Most semantic actions concatenate two or more strings and pro-
- duce a string. They use the C function
-
- cat(n, t1,t2, ...,tn)
-
- which takes a variable number of arguments and returns a pointer
- to the concatenated result. The first argument is the number of
- strings to be concatenated. The other arguments are the strings
- in tree format. The result is also in tree format.
-
- As an example, the specification
-
- While1(w,x,y,z) "while " x " do " z
-
- produces the definition
-
- #define While1(w,x,y,z) $$ = cat(4,q("while "),x,q(" do "),z)
-
-
- Another function, item(t, n), returns the nth node in the
- ``list'' t. For example, the name of a procedure is contained in
- the second node in the list for the procedure declaration (see
- Appendix A). Thus, if the procedure heading list is the value of
- head, item(head, 2) produces the procedure name.
-
- There are three macros that produce values associated with a
- node. Str0() produces the string. For example, code conditional
- on the main procedure could be written as follows:
-
- if (strcmp(Str0(item(head,2)),"main") == 0) {
- .
- .
- .
- }
-
-
- As this example illustrates, semantic actions may be too com-
- plicated to be represented conveniently by macros. In such cases
- parser functions can be used. A file is provided for such func-
- tions. See Section 10 for an example.
-
- The macros Line and Col produce the source-file line number
- and column, respectively, of the place where the text for the
- node begins. The use of these attributes is illustrated in Sec-
- tion 10.
-
- In some sophisticated applications, variant translators may
- need other capabilities that are available in the translator sys-
- tem. For example, if a function produces a string, it may be
- necessary place this string in a place that survives the function
- call. The Icon translator has a string allocation facility that
- can be used for this purpose: the free space begins at strfree
- and putident(n) installs a string of length n there. The use of
- such facilities requires more knowledge of the translator system
-
-
-
- - 9 -
-
-
-
-
-
-
-
-
- than it is practical to provide here. Persons with special needs
- should study the translator in more detail.
-
-
- 6.__Modifying_Lexical_Components_of_the_Translator
-
- The lexical analyzer for Icon is written in C rather than in
- Lex in order to make it easier to perform semicolon insertion and
- other complicated tasks that occur during lexical analysis.
- Specification files are used to build portions of the lexical
- analyzer, making it easy to modify. The three kinds of changes
- that are needed most often are the addition of new keywords,
- reserved words, and operators.
-
- The identity translator accepts any identifier as a keyword,
- leaving its resolution to subsequent processing by the Icon
- translator. Nothing need be done to add a new keyword except for
- processing it properly in the variant translator.
-
- The specification file tokens.txt contains a list of all
- reserved words and operator symbols. Each symbol has associated
- flags that indicate whether it can begin or end an expression.
- These flags are used for semicolon insertion.
-
- To add a new reserved word, insert it in proper alphabetical
- order in the list of reserved words in tokens.txt and give it a
- new token name. To add a new operator, insert it in the list of
- operators in tokens.txt (order there is not important) and give
- it a new token name. The new token names must be added to the
- grammar. See Appendix A.
-
- The addition of a new operator also requires modifying the
- specification of a finite-state automaton, optab.txt. Its struc-
- ture is straightforward.
-
-
- 7.__Modifying_Yacc
-
- Before building a variant translator, it may be necessary to
- modify Yacc, since the version of Yacc that normally is distri-
- buted with UNIX does not provide enough space to process Icon's
- grammar. To build a version of Yacc with more space, edit the
- Yacc source file dextern and change the definition of MEMSIZE in
- the HUGE section to
-
- #define MEMSIZE 22000
-
- and use
-
- #define HUGE
-
- in files. Then rebuild Yacc.
-
-
-
-
-
- - 10 -
-
-
-
-
-
-
-
-
- 8.__Building_a_Variant_Translator
-
- The steps for setting up the directory structure for a variant
- translator are:
-
- + create a directory for the translator
-
- + make that directory the current directory
-
- + execute the shell script icon_vt supplied with Version 8
- of Icon
-
- For example, if the variant translator is to be in the directory
- xtran and Icon is installed in /usr/icon/v8, the following com-
- mands will build the variant translator:
-
- mkdir xtran
- cd xtran
- /usr/icon/v8/icon_vt
-
-
- The shell script icon_vt creates a number of files in the new
- directory and in two sub-directories: itran and h. The files
- that comprise a variant translator are listed in Appendix C.
- Unless changes to the lexical analyzer are needed, at most three
- files need to be modified to produce a new translator:
-
- variant.defsvariant macro definitions (initially empty)
- variant.c parser functions (initially empty)
- itran/icon_g.cYacc grammar for Icon
-
- A non-empty variant.c usually requires #include files to provide
- needed declarations and definitions. See the example that fol-
- lows.
-
- The translator make file, itran/Makefile, is listed in Appen-
- dix D. The make file in the main translator directory just
- insures that the program define has be compiled and then does a
- make in the itran directory. Performing a make in the itran
- directory first combines variant.defs with the standard macro
- definitions (in ident.defs) and processes them to produce the
- definition file, itran/gdefs.h. The C preprocessor is then used
- to expand the macros in itran/icon_g.c using these definitions
- and the result, after some ``house keeping'', is put in
- itran/expanded.g. Next, Yacc uses the grammar in itran/expanded.g
- to build a new parser, parse.c. There are over 200 shift/reduce
- conflicts in the identity translator. All of these conflicts are
- resolved properly. More conflicts should be expected if addi-
- tions are made to the grammar. Reduce/reduce conflicts usually
- indicate errors in the grammar. Finally, all the components of
- the system are compiled, including variant.c, and linked to pro-
- duce vitran, the variant translator.
-
- Most of the errors that may occur in building a variant
-
-
-
- - 11 -
-
-
-
-
-
-
-
-
- translator are obvious and easily fixed. Erroneous changes to the
- grammar, however, may be harder to detect and fix. Error messages
- from Yacc or from compiling itran/parse.c refer to line numbers
- in itran/expanded.g. These errors must be related back to
- variant.defs or itran/icon_g.c by inspection of itran/expanded.g.
-
-
- 9.__Using_a_Variant_Translator
-
- The translator, vitran, takes an input file on the command
- line and translates it. The specification - in place of an input
- file indicates standard input. The output of vitran is written
- to standard output. For example,
-
- vitran pre.icn >post.icn
-
- translates the file pre.icn and produces the output in post.icn.
- The suffix .icn on the argument to vitran is optional; the exam-
- ple above can be written as:
-
- vitran pre >post.icn
-
- Assuming the variant translator produces Icon source language,
- post.icn can be translated into object code by
-
- icont post.icn
-
- where icont is the standard Icon command processor.
-
- Variant translators accept the same options for translation
- that the standard Icon translator does. For example, the option
- -s causes the translator to work silently. See the manual page
- for icont for details [10].
-
-
- 10.__An_Example
-
- As an example of the construction of a variant translator,
- consider the problem of monitoring string concatenation in Icon
- programs, writing out the size of each string constructed by con-
- catenation. One way to do this, of course, is to modify Icon
- itself, adding the necessary monitoring code to the C function
- that performs concatenation. An alternative approach, which does
- not require changes to Icon itself, is to produce a variant
- translator that translates concatenation operations into calls of
- an Icon procedure, but leaves everything else unchanged:
-
- expr1 || expr2 -> Cat(expr1,expr2)
-
- The procedure Cat() might have the form:
-
-
-
-
-
-
-
- - 12 -
-
-
-
-
-
-
-
-
- procedure Cat(s1,s2)
- write(&errout,"concatenation: ",*s1 + *s2," characters")
- return s1 || s2
- end
-
- Such a procedure could be added to a preprocessed program (Cat()
- is not preprocessed itself) in order to produce the desired
- information when the program is run.
-
- A single definition in variant.defs suffices:
-
- ||(x,y,z) "Cat(" x "," z ")"
-
- Note, however, that Icon also has an augmented assignment opera-
- tor for string concatenation:
-
- expr1 ||:= expr2
-
- This operation can be handled by the definition
-
- ||:=(x,y,z) x " :=