home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The C Users' Group Library 1994 August
/
wc-cdrom-cusersgrouplibrary-1994-08.iso
/
vol_200
/
296_01
/
havener.txt
< prev
next >
Wrap
Text File
|
1989-10-03
|
16KB
|
271 lines
Rapid Prototyping as a Design Method - Building a C to C++ Migrator
by Charles D. Havener
Author: Senior Principal Engineer
GenRad Inc. MS 1A
300 Baker Avenue
Concord Mass. 01742
Masters degrees in electrical engineering from Cornell University and in
Computer Science from Boston University. Instructor in C in the
Northeastern University State of the Art program. My work at GenRad is
designing and implementing control software in C for automatic component
and printed circuit board test equipment.
INTRODUCTION
This article advocates the conscious use of rapid prototyping as a
powerful technique to be used in the software design process. A working
prototype of a C to C++ migrator tool is a concrete example of how to
build prototypes for text or computer language processing problems. The
migrator tool is intended to automate some of the tasks in porting old
style C code to the new ANSI style or C++ style that requires function
prototypes. The goal is to automatically extract prototypes for use in
header files and to edit the old style C into a new file that uses
prototypes. Any extern function declarations should be edited out by
the tool since they will presumably be in header files.
RAPID PROTOTYPING - GOALS AND PHILOSOPHY
In his paper "No Silver Bullets" Fred Brooks ( author of the software
classic, The Mythical Man-Month ) states that "one of the most promising
of the current technological efforts, and one that attacks the essence,
not the accidents, of the software problem, is the development of
approaches and tools for rapid prototyping of systems, as prototyping is
part of the iterative specification of requirements."[1] The traditional
waterfall diagram of design includes the following steps; Requirements -
Specification - Design. Most real world programs are complex enough
that some iterative steps around the requirements and specifications are
needed to do the design. Thus the following is a better model;
Requirements ->- Specification --> Design ---> Develop --->
^ v
| |
-------- Prototype <-------------
Prototyping helps us understand a complex problem in greater depth, it
permits exploring different design approaches, it can provide early
warning of unexpected difficulties, it can overcome the paralysis that
sometimes sets in when you don't know enough about a problem to do a
good design, and it raises morale by providing a working system early.
A conscious recognition that our designs will follow the prototyping
model leads us to collect software tools and code that can be used in
this process which reinforces its effectiveness over time. Prototyping
is exploratory in nature. It is important not to get sidetracked into
perfecting some particular aspect of the design. The idea is to drive
as deeply as possible, to unearth problem areas. Reuse old code,"don't
let the work of others evade your eyes, plagiarize", lash things
together with Unix shell scripts or DOS batch files. Don't bother with
elaborate error handling. Use tools.
There are several commercial tools for prototyping user interfaces such
as Dan Bricklin's Demo Program. The migrator program deals with text,
i.e. computer language program source code. Lex and YACC work-alikes
are widely available tools that are invaluable for prototypes that
involve language processing. These tools were created to help build
compilers and translators by automatically creating scanners and parsers
but they have many other uses. This article assumes the reader is
familiar with these tools or will be motivated to master them by what
follows. The migrator was originally developed on a Sun workstation and
then ported to PC-DOS using the MKS YACC (reviewed in the April 1989
issue of the C Users Journal) and the new public domain flex, a public
domain re-implementation of lex with improvements. If you obtain the
flex sources ( e.g. from the Austin Code Works ) and port them to the
PC, there is one tricky part in addition to the extra long variable
names. Be sure that the declarations for the extern variable yytext
agree in both the YACC and lex that you use. If one uses char yytext[],
and the other uses extern char* yytext, then it will not work.
THE C TO C++ MIGRATOR
Listing 1 shows sample input and output files from the migrator tool and
Figure 1 is an overview of how the migrator is put together and
controlled. The source code accompanying this article is a snapshot of
the present state of the migrator prototype. The remainder of this
article covers the rapid prototyping process phase by phase as the
migrator tool was 'grown' over a period of several days.
A search was conducted to see if a migrator tool or function prototype
extractor was available. There was some activity on the Usenet in the
C++ news group about such tools but the only one posted required access
to the lint program sources on Unix. A modified lint could be created
that would produce prototypes. Nothing was found that would do the
complete job, though it was rumored that some were under development.
FINDING A C GRAMMAR
There is a 480 line C grammar that was posted several times to the
Usenet. The migrator uses this grammar which was developed by Jeff Lee
at Georgia Tech based on April 1985 ANSI C committee drafts provided by
Arnold Robbins. Many of the commercial YACC tools come with several
example grammars, e.g PCYACC comes with a 518 line C grammar, as well as
C++ and Pascal grammars. These can be worth the price of the software
alone for use in prototyping things like C++ class browsers etc. The
public C grammar generates parsers that accept various illegal C
statements such as "Hello World"++, --1.23, and *'a' according to Jeff
Lee but for our application it doesn't matter. Presumably we will be
providing C code to the migrator that has been validated by a true
compiler. The first step in the migrator prototyping was to use the C
grammar and lex specification to create a parser that would accept a C
program. The idea was to add the semantic action routines to the
grammar in the places required to produce the migrator. The grammar
requires that the C source code has been preprocessed by the standard
cpp to remove #include etc lines and to expand all macros. For the
prototype migrator a shell script was used that first fed the C source
code through the cpp and then to the migrator. Listings 2 & 3 are the
lex input file and the relevant parts of the grammar. ( Source code
available for this article from the C Users Journal includes the full
grammar ).
SOLVING THE TYPEDEF PROBLEM
The C grammar provided will not accept C programs that use typedef.
This was the first hurdle to overcome and it was mentioned as a problem
several times on the Usenet. The difficulty arises because the lexical
analysis phase or scanner cannot distinguish a typedef identifier from
any other variable identifier without a symbol table. Thus the next
step was to add a symbol table and the appropriate code to use it. A
simple hashing symbol table module is in the the listing symtab.c. I
have used this code with minor modifications many times. A symbol table
module is something every devotee of rapid prototyping should have
handy. It does the standard things, it has an initsymtab(), storesym()
and findsym() interfaces. Actions were added ,within braces, to the
grammar to store typedef symbols into the table. In the production
"declaration : TYPEDEF ;" a global flag in_tdef was set on the
assumption that the next IDENTIFIER encountered would be a typedef name.
The "identifier: IDENTIFIER" production at the end of the grammar makes
use of the global flag to store the typedef names into the symbol table.
Later on, the lexer looks up every identifier it finds and if it is in
the table it returns TYPE_NAME to the parser rather than IDENTIFER.
(Later the global symbols su and enumflag were added to handle the
identifier names associated with structs or unions and enumerated data.)
At this point the migrator would accept without complaint most valid C
program. It didn't do anything of course, and its only complaint was
syntax error if something went wrong. The only concession to syntax
error assistance was in the count() function in the ctocxx.l listing.
This updates a global count variable so the yyerror() function called by
yacc can report which column of the input line it gave up on. If the
ECHO macro line is not commented out, the count() function also copies
the input to the standard output to provide more complete information.
EXTRACTING FUNCTION PROTOTYPES
Extracting and building function prototypes for output to a proto.out
file was the next goal. Since the lexer passed all input text through
the count() function, a call to a new function called stuff() was added
there. The idea was to use the appropriate grammar rules to set a
global variable that the stuff() function could use to build up a
function prototype from the immediate stream of text that it had saved.
The listing subs.c contains the stuff() function. It uses a ring buffer
of about 2000 characters to remember the text stream. The stuff()
function in the listing subs.c grew in an ad hoc way from a very small
thing into a monster as functionality was added. The stuff() function
was at the heart of the experimenting and learning process. It may be
possible to have the grammar do more of the work in extracting the
elements needed to build up the function prototypes. However, a
decision was made early in the rapid prototyping to minimize the grammar
actions and to see how much could be done by applying heuristic rules to
the text saved in the ring buffer.
The function prototype was built up in the func_proto[] buffer character
by character as the stuff() function outer while loop stepped through
the current token's text in yytext[]. Each little if section handles
some situation that rapid prototyping uncovered. For example, the
register keyword may appear in old style function argument declarations
but not in function prototypes so it had to be discarded. If the
argument declaration style 'int a,b,c' was used, the root e.g. int had
to be saved and prepended to each argument to make a valid function
prototype e.g. 'int func(int a,int b,int c);'. There are some
functions that don't need a function prototype, e.g. main() and any
function declared to be static. Finally, note that old style
declarations often defaulted to int but this must be added for function
prototypes. The symbol table is useful here. Just before the newly
built prototype is written out, the first word is looked up in the
table. If it isn't there, 'int' is prepended.
BUILDING THE SED SCRIPT
The next step was to provide a means to automatically edit the original
file into the desired form. One possible design would be to make the
migrator edit the input on the fly and write it out. This was
considered and rejected as too much work for a prototype. After all,
the text had already gone by before we could figure out how to alter it.
This would mean a delay buffer between input and output. At first it
seemed we could write out some edit commands for the unix 'ed' line
editor but that editor renumbered all the lines of text everytime a
delete was done. The sed stream editor worked perfectly. Sed accepts
simple delete and insert commands based on line numbers and fortunately
the cpp preprocessor inserts '# line' code into the program so we can
always tell what line we are on. The pound() function in the ctocxx.l
file listing takes care of this when the lexer matches a '#line' in the
input text. The sed edit script commands are written out to the ed.out
file for later use by the shell script or batch file that drives the
migrator. Sed has one minor problem, it holds the entire script in
memory at once and for very large complex C files it was possible to
exceed sed's command buffer. Much of the complexity in the ed_delete(),
ed_flush() and elsewhere in the subs.c listing is to minimize the size
of the sed script produced. The grammar tended to produce line delete
commands such as 5d 6d 7d 8d. By delaying the command output it was
possible to put out 5,8d commands which saved space in the sed command
buffer.
The write_proto() function in the subs.c listing is called from the
declarator2 production rule of the grammar. At this point the parser
has encountered what may be the beginning of a function declaration. It
may be an extern function which must be ignored, or a function with no
arguments, or one with several arguments. The global proto_flag is set
true so the stuff() function will add the arg declarations to the
func_proto buffer as it finds them. But first, the write_proto()
function must back up over the input text to the beginning of the
function declaration. It uses a character stack embodied by the push()
and pop() functions to save chars as it backs up through the ring
buffer. One coding style puts the function return type declaration on
the preceding line, e.g. the crazy() function in the example listing.
The migrator handles this but it tends to delete extra blank lines used
for spacing.
STATE OF THE MIGRATOR - WHAT WAS LEARNED
A few hundred thousand lines of C has been successfully passed through
the migrator in it's current state and while it is quite useful it is
not a panacea for porting old style code. Typically the converted code
has harmless type mismatches that will no longer compile. In some cases
the old style compiler would accept syntax that is illegal and the
migrator would not accept it. The port to DOS required the addition of
the non-standard keywords near,far, and decl to the ctocxx.l file.
Presently they are ignored, the migrator must be enhanced to deal with
them for code that uses them. Furthermore, while comma separated
function argument lists are accepted, comma separated typedef lists are
not. A potentially major problem when moving C code to C++ is that
declarations like 'struct foo foo' are illegal. In C++ the variable and
structure tag name space are the same so the names must be different.
This seems like something that should be fixed manually and not by an
automatic tool like the migrator.
The migrator prototype has revealed some features that are desirable to
support in a finished product. For example, the function argument names
are currently retained in the prototypes. This tends to make for very
long lines and since they are ignored by the compiler there should be an
option to produce function prototypes without argument names. Code that
contains #ifdef sections with functions inside may be elided by the cpp.
The migrator will not produce function prototypes or editing scripts for
those sections of code. It may take several passes through the migrator
with different conditions set to make all the changes. Another minor
issue is that some code is written in a style that uses #defines rather
than typedefs to make pseudo types. For example #define COUNT int. The
C preprocessor removes all instances of COUNT so the migrator will not
see it and pass it through. The prototype migrator works reasonably
well and it could be cleaned up to be a production quality tool. The
sed editing is simple enough that it could easily be folded back into
the migrator. The simple script or .bat controller functions could be
pulled into the main() function by using various command line options.
[1] "No Silver Bullets" by Frederick P. Brooks Jr. Unix Review, Nov
1987 pp 39-48, or IEEE Computer Magazine April 1987.