C++ Annotations Version 4.0.0

Chapter 15: Concrete examples of C++

We're always interested in getting feedback. E-mail us if you like this guide, if you think that important material is omitted, if you encounter errors in the code examples or in the documentation, if you find any typos, or generally just if you feel like e-mailing. Mail to Frank Brokken or use an e-mail form. Please state the concerned document version, found in the title. If you're interested in a printable PostScript copy, use the form, or better yet, pick up your own copy by ftp from ftp.icce.rug.nl/pub/http.

This chapter presents a number of concrete examples of programming in C++. Items from this document such as virtual functions, static members, etc. are rediscussed. Examples of container classes are shown.

Another example digs into the peculiarities of using a parser- and scanner-generator with C++. Once the input for a program exceeds a certain level of complexity, it's advantageous to use a scanner- and parser-generator for creating the code which does the actual input recognition. The example describes the usage of these tool in a C++ environment.

15.1: Storing objects: Storable and Storage

A reoccurring task of many programs is the storage of data, which are then sorted, selected, etc.. Storing data can be as simple as maintaining an array of ints, but can also be much more complex, such as maintaining file system information by the kernel of an operating system.

In this section we take a closer look at the storage of generic objects in memory (i.e., during the execution of a program). Conforming to the object-oriented recipe we shall develop two classes: a class Storage, which stores objects, and a class Storable, the prototype of objects which can be stored.

15.1.1: The global setup

As far as the functionality of the class Storage is concerned, objects can be added to the storage and objects can be obtained from the storage. Also it must be possible to obtain the number of objects in the storage.

As far as the internal data organization of the storage is concerned, we opt for an approach in which Storage maintains an array which can be reallocated, consisting of pointers to the stored objects.

The internal organization of the class Storage is illustrated in figure 11.

figure 11 is shown here.
figure 11: Internal organization of the class Storage.

15.1.1.1: Interface functions of the class Storage

The usage (interface) of the class Storage is contained in three member functions. The following list describes these member functions and mentions the class Storable, more on this later.

15.1.1.2: To copy or not to copy?

There are two distinct design alternatives for the function add(). These considerations address the choice whether the stored objects (the squares on the right side of figure 11) should be copies of the original objects, or the objects themselves.

In other words, should the function add() of the class Storage:

These considerations are not trivial. Consider the following example:

Storage store; Storable something; store.add(something); // add to storage // let's assume that Storable::modify() is defined something.modify(); // modify original object, Storable *retrieved = store.get(0); // retrieve from storage // NOW: is "*retrieved" equal to "something" ?!

If we choose to store (addresses of) the objects themselves, then at the end of the above code fragment, the object pointed to by retrieved will equal something. A manipulation of previously stored objects thereby alters the contents of the storage.

If we choose to store copies of objects, then obviously *retrieved will not equal something but will remain the original, unaltered, object. This approach has a great merit: objects can be placed into storage as a `safeguard', to be retrieved later when an original object was altered or even ceased to exist. In this implementation we therefore choose for this approach.

15.1.1.3: Who makes the copy?

The fact that copies of objects should be stored presents a small problem. If we want to keep the class Storage as universal as possible, then the making of a copy of a Storable object cannot occur here. The reason for this is that the actual type of the objects to store is not known in advance. A simplistic approach, such as the following:

void Storage::add(Storable const *obj) { Storable *to_store = new *obj; // now add to_store instead of obj . . }

shall not work. This code attempts to make a copy of obj by using the operator new, which in turn calls the copy constructor of Storable. However, if Storable is only a base class, and the class of the object to store is a derived class (say, a Person), how can the copy constructor of the class Storable create a copy of a Person?

The making of a copy therefore must lie with the actual class of the object to store, i.e., with the derived class. Such a class must have the functionality to create a duplicate of the object in question and to return a pointer to this duplicate. If we call this function duplicate() then the code of the adding function becomes:

void Storage::add(Storable const *obj) { Storable *to_store = obj->duplicate(); // now add to_store instead of obj . . }

The function duplicate() is called in this example by using a pointer to the original object (this is the pointer obj). The class Storable is in this example only a base class which defines a protocol, and not the class of the actual objects which will be stored. Ergo, the function duplicate() need not be defined in Storable, but must be concretely implemented in derived classes. In other words, duplicate() is a pure virtual function.

15.1.2: The class Storable

Using the above discussed approach we can now define the class Storable. The following questions are of importance:

The class definition and its functions are given below:

class Storable { public: virtual ~Storable(); virtual Storable *duplicate() const = 0; }; Storable::~Storable() { }

15.1.2.1: Converting an existing class to a Storable

To show how (existing) classes can be converted to derivation from a Storable, consider the below class Person from section 5. This class is re-created here, conforming to Storable's protocol (only the relevant or new code is shown):

class Person: public Storable { // copy constructor Person(Person const &other); // assignment Person const &operator=(Person const &other); // duplicator function Storable *duplicate() const; . . }

When implementing the function Person::duplicate() we can use either the copy constructor or the default constructor with the overloaded assignment operator. The implementation of duplicate() is quite simple:

// first version: Storable *Person::duplicate() const { // uses default constructor in new Person Person *dup = new Person; // uses overloaded assignment in *dup = *this *dup = *this; return (dup); } // second version: Storable *Person::duplicate() const { // uses copy constructor in new Person(*this) return (new Person(*this)); }

The above conversion from a class Person to the needs of a Storable supposes that the sources of Person are at hand and can be modified. However, even if the definition of a Person class is not available, but is e.g., contained in a run-time library, the conversion to the Storable format poses no difficulties:

class StorablePerson: public Person, public Storable { public: // duplicator function Storable *duplicate() const; }; Storable *StorablePerson::duplicate() const { return (new *(Person*)this); }

15.1.3: The class Storage

We can now implement the class Storage. The class definition is given below:

class Storage: public Storable { public: // destructors, constructor ~Storage(); Storage(); Storage(Storage const &other); // overloaded assignment Storage const &operator=(Storage const &other); // functionality to duplicate storages Storable *duplicate() const; // interface void add(Storable *newobj); int nstored() const; Storable *get(int index); private: // copy/destroy primitives void destroy(); void copy(Storage const &other); // private data int n; Storable **storage; };

Concerning the class definition we remark:

The destructor, constructors and the overloaded assignment function are listed below:

// default constructor Storage::Storage() { n = 0; storage = 0; } // copy constructor Storage::Storage(Storage const &other) { copy(other); } // destructor Storage::~Storage() { destroy(); } // overloaded assignment Storage const &Storage::operator=(Storage const &other) { if (this != &other) { destroy(); copy(other); } return (*this); }

The primitive functions copy() and destroy() unconditionally copy another Storage object, or destroy the contents of the current one. Note that copy() calls duplicate() to duplicate the other's stored objects:

void Storage::copy(Storage const &other) { n = other.n; storage = new Storable* [n]; for (int i = 0; i < n; i++) storage [i] = other.storage [i]->duplicate(); } void Storage::destroy() { for (register int i = 0; i < n; i++) delete storage [i]; delete storage; }

The function duplicate(), which is required since Storage itself should be a Storable, uses the copy constructor to duplicate the current object:

Storable *Storage::duplicate() const { return (new *this); }

Finally, here are the interface functions which add objects to the storage, return them, or determine the number of stored objects ( Note: the function realloc() that is used in this section should actually not be used. A better procedure would be to create a C++ variant for the realloc() function. A modification is in the pipeline....)

void Storage::add(Storable const *newobj) { // reallocate storage array storage = (Storable **) realloc(storage, (n + 1) * sizeof(Storable *)); // put duplicate of newobj in storage storage [n] = newobj->duplicate(); // increase number of obj in storage n++; } Storable *Storage::get(int index) { // check if index within range if (index < 0 || index >= n) return (0); // return address of stored object return (storage [index]); } int Storage::nstored() const { return (n); }

15.2: A binary tree

This section shows an implementation of a binary tree in C++. Analogously to the classes Storage and Storable (see section 15) two separate classes are used: one to represent the tree itself, and one to represent the objects which are stored in the tree. The classes will be appropriately named Tree and Node.

15.2.1: The Node class

The class Node is an abstract (pure virtual) class, which defines the protocol for the usage of derived classes with a Tree. Concerning this protocol we remark the following:

The complete definition and declaration of the class Node is given below:

class Node { public: // destructor virtual ~Node(); // duplicator virtual Node* duplicate() const = 0; // comparison of 2 objects virtual int compare(Node const *other) const = 0; // function to do whatever is needed to the node virtual void process() = 0; // called when object to add was already in the tree virtual void already_stored(); }; Node::~Node() { } void Node::already_stored() { }

15.2.2: The Tree class

The class Tree is responsible for the storage of objects which are derived from a Node. To implement the recursive tree structure, the class Tree has two private pointers as its data, pointing to subtrees: a Tree *left and Tree *right. The information which is contained in a node of the tree is represented as a private field Node *info.

To scan a binary tree, the class Tree offers three methods: preorder, inorder and postorder. When scanning in preorder first a leaf in a node is processed, then the left subtree is scanned and finally the right subtree is scanned. When scanning inorder first the left subtree is scanned, then the leaf itself is processed and finally the right subtree is scanned. When scanning in postorder first the left and right subtrees are scanned and then the leaf itself is processed.

The definition of the class Tree is given below:

class Tree { public: // destructor, constructors ~Tree(); Tree(); Tree(Tree const &other); // assignment Tree const &operator=(Tree const &other); // addition of a Node void add(Node *what); // processing order in the tree void preorder_walk(); void inorder_walk(); void postorder_walk(); private: // primitives void copy(Tree const &other); void destroy(); // data Tree *left, *right; Node *info; };

15.2.2.1: The `standard' functions

As can be seen from the class definition, Tree contains pointer fields. This means that the class will need a destructor, a copy constructor and an overloaded assignment function to ensure that no allocation problems occur.

The destructor, the copy constructor and the overloaded assignment function are implemented with two primitive operations copy() and destroy() (these functions are presented later):

// destructor: destroys the tree Tree::~Tree() { destroy(); } // default constructor: initializes to 0 Tree::Tree() { left = right = 0; info = 0; } // copy constructor: initializes to contents of other object Tree::Tree(Tree const &other) { copy(other); } // overloaded assignment Tree const &Tree::operator=(Tree const &other) { if (this != &other) { destroy(); copy(other); } return (*this); }

15.2.2.2: Adding an object to the tree

The addition of a new object to the tree is a recursive process. When the function add() is called to insert an object into the tree, there are basically only a few possibilities:

The function add() is listed below:

void Tree::add(Node *what) { if (! info) info = what->duplicate(); else { register int cmp = info->compare(what); if (cmp < 0) { if (! left) { left = new Tree; left->info = what->duplicate(); } else left->add(what); } else if (cmp > 0) { if (! right) { right = new Tree; right->info = what->duplicate(); } else right->add(what); } else info->already_stored(); } }

15.2.2.3: Scanning the tree

The class Tree offers three methods of scanning a binary tree: preorder, inorder and postorder. The three functions which define these actions are recursive:

void Tree::preorder_walk() { if (info) info->process(); if (left) left->preorder_walk(); if (right) right->preorder_walk(); } void Tree::inorder_walk() { if (left) left->inorder_walk(); if (info) info->process(); if (right) right->inorder_walk(); } void Tree::postorder_walk() { if (left) left->postorder_walk(); if (right) right->postorder_walk(); if (info) info->process(); }

15.2.2.4: The primitive operations copy() and destroy()

The functions copy() and destroy() are two private member functions which implement primitive operations of the class Tree: the copying of the contents of another Tree or the destroying of the tree.

void Tree::destroy() { delete info; if (left) delete left; if (right) delete right; } void Tree::copy(Tree const &other) { info = other.info ? other.info->duplicate() : 0; left = other.left ? new Tree(*other.left) : 0; right = other.right ? new Tree(*other.right) : 0; }

Concerning this implementation we remark the following:

15.2.3: Using Tree and Node

We illustrate the usage of the classes Tree and Node with a program that counts words in files. Words are defined as series of characters, separated by white spaces. The program shows which words are present in which file, and how many times.

Below is the listing of a class Strnode. This class is derived from Node and implements the virtual functions. Note also how this class implements the counting of words; when a given word occurs more than one time, Tree will call the member function already_stored(). This function simply increases the private counter variable times.

class Strnode: public Node { public: // destructor, constructors ~Strnode(); Strnode(); Strnode(Strnode const &other); Strnode(char const *s); // assignment Strnode const &operator=(Strnode const &other); // functions required by Node protocol Node* duplicate() const; int compare(Node const *other) const; void process(); void already_stored(); private: // data char *str; int times; }; Strnode::~Strnode() { delete str; } Strnode::Strnode() { str = 0; times = 0; } Strnode::Strnode(Strnode const &other) { str = strdup(other.str); times = other.times; } Strnode::Strnode(char const *s) { str = strdup(s); times = 1; } Strnode const &Strnode::operator=(Strnode const &other) { if (this != &other) { delete str; str = strdup(other.str); times = other.times; } return (*this); } Node *Strnode::duplicate() const { return (new Strnode(*this)); } int Strnode::compare(Node const *other) const { Strnode *otherp = (Strnode *) other; if (str && otherp->str) return (strcmp(str, otherp->str)); if (! str && ! otherp->str) return (0); return ((int) otherp->str - (int) str ); } void Strnode::process() { if (str) printf("%4d\t%s\n", times, str); } void Strnode::already_stored() { times++; } void countfile(FILE *inf, char const *name) { Tree tree; char buf [255]; while (1) { fscanf(inf, " %s", buf); if (feof(inf)) break; Strnode *word = new Strnode(buf); tree.add(word); delete word; } tree.inorder_walk(); } int main(int argc, char **argv) { register int exitstatus = 0; if (argc > 1) for (register int i = 1; i < argc; i++) { FILE *inf = fopen(argv [i], "r"); if (! inf) { fprintf(stderr, "wordc: can't open \"%s\"\n", argv [i]); exitstatus++; } else { countfile(inf, argv [i]); fclose(inf); } } else countfile(stdin, "--stdin--"); return (exitstatus); }

15.3: Using Bison and Flex

The example discussed in this section digs into the peculiarities of using a parser- and scanner-generator with C++. Once the input for a program exceeds a certain level of complexity, it's advantageous to use a scanner- and parser-generator for creating the code which does the actual input recognition. The example about this topic assumes that the reader knows how to use the scanner generator flex and the parser generator bison. Both bison and flex are well documented elsewhere. The original predecessors of bison and flex, called yacc and lex are described in several books, e.g. in O'Reilly's book `lex & yacc'.

However, the scanner and parser generators are also (and maybe even more commonly, nowadays) available as free software. Both bison and flex can be obtained from prep.ai.mit.edu/pub/gnu. Flex will create a C++ class when called as flex++, or when the -+ flag is used. With bison the situation is a bit more complex. Scattered over the Internet several bison++ archives can be found (e.g., in rzbsdi01.uni-trier.de). The information in these archives usually dates back to 1993, irrespective of the version number mentioned with the archive itself. (However, the given ftp-archive also contains dos-executables, for those who are interested....)

Using flex++ and bison++ a class-based scanner and parser can be generated. The advantage of this approach is that the interface to the scanner and the parser tends to become a bit cleaner than without using the class interface.

Below two examples are given. In the first example only a lexical scanner is used to monitor the production of a file from several parts. This example focuses on the lexical scanner, and on switching files while churning through the parts. The second example uses both a scanner and a parser to transform standard arithmetic expressions to their postfix notation, commonly encountered in code generated by compilers and in HP-calculators. The second example focuses on the parser.

15.3.1: Using Flex++ to create a scanner

In this example a lexical scanner is used to monitor the production of a file from several parts. This example focuses on the lexical scanner, and on switching files while churning through the parts. The setup is as follows: The input-language knows of a #include statement, which is followed by a string indicating the file which should be included at the location of the #include.

In order to avoid complexities that have nothing to do with the current example, the format of the #include statement is #include <filepath>. The file specified between the pointed brackets should be available at the location indicated by filepath. If the file is not available, the program should terminate using a proper error message.

The program is started with one or two filename arguments. If the program is started with just one filename argument, the output is written to the standard output stream cout. Otherwise, the output is written to the stream whose name is given as the program's second argument.

The program uses a maximum nesting depth. Once the maximum is exceeded, the program terminates with an appropriate error message. In that case, the filenamestack indicating where which file was included should be printed.

One minor extra feature is that comment-lines should be recognized: include directives in comment-lines should be ignored, comment being the standard C++ comment-types.

The program is created in the following steps:

15.3.1.1: The flex++ specification file

The organization of the lexical scanner specification file is similar to the one used with flex. However, flex++ now creates a class (yyFlexLexer) from which the class Scanner will be derived.

The code associated with the rules will be located inside the class yyFlexLexer. If it would be handy to access the member-functions of the derived class within that code, it must be realized that the this pointer is a base-class pointer. In order to reach the derived class, the this pointer must be cast to a pointer to the derived class.

Looking at the rules, notice that we'll need rules for the recognition of the comment, for the include directive, and for the remaining characters. This is all fairly standard practice. When an include directive is detected, the derived-class' member function switchSource() is called, which will perform the required file switching. When EOF is detected, the derived class' member function popSource() is called, which will pop the previous previously pushed file, returning 1. Once the file-stack is empty, the function will return 0, resulting in the call of yyterminate(), which will terminate the scanner.

The lexical scanner specification file has three sections: a C++ preamble, containing code which can be used in the code defining the actions to be performed once a regular expression is matched, a Flex++ symbol area, which is used for the definition of symbols, like a mini scanner, or options, like %option yylineno when the lexical scanner should keep track of the line numbers of the files it is scanning and, finally a rules section, in which the regular expressions and their actions are given. In the current example, the lexer should mainly copy information from the istream *yyin to the ostream *yyout, for which the predefined macro ECHO can be used.

Here is the complete and annotated lexical scanner specification file to be used with flex++:

%{ /* ---------------------------------------------------------------------------- C++ -preamble. Include header files, other than those generated by flex++ and bison++. E.g., include the interface to the class derived from yyFlexLexer ---------------------------------------------------------------------------- */ #include "scanner.h" // The interface of the derived class %} /* ---------------------------------------------------------------------------- Flex++ symbol area ~~~~~~~~~~~~~~~~~~ The symbols mentioned here are used for defining e.g., a miniscanner ---------------------------------------------------------------------------- */ %x comment %option yylineno eolnComment "//".* anyChar .|\n /* ---------------------------------------------------------------------------- Flex rules area: ~~~~~~~~~~~~~~~~ Regular expressions below here define what the lexer will recognize. ---------------------------------------------------------------------------- */ %% /* The comment-rules: comment lines are ignored. */ {eolnComment} "/*" BEGIN comment; <comment>{anyChar} <comment>"*/" BEGIN INITIAL; /* File switching: #include <filepath> */ "#include "[^>]*">" ((Scanner *)this)->switchSource(); /* The default rules: eating all the rest, echoing it to output */ {anyChar} ECHO; /* The <<EOF>>)rule: pop a pushed file, or terminate the lexer */ <<EOF>> { if (!((Scanner *)this)->popSource()) yyterminate(); }

Since the derived class is able to access the information stored within the lexical scanner itself (it can even access the information directly, since the data members of yyFlexLexer are protected, and thus accessible to derived classes), very much processing can be left to the derived class. This results in a very clean setup of the lexer specification file, in which hardly any code is required in the preamble.

15.3.1.2: The derived class: Scanner

The class Scanner is derived from the class yyFlexLexer, generated by flex++. The derived class has access to the data controlled by the lexical scanner. In particular, the derived class has access to the following data members:

Other members are available as well, but they are less often used in our experience. Details can be found in the file FlexLexer.h, which is part of the flex distribution.

The class Scanner has to perform two tasks: It should push file information about the current file to a filestack, and should pop the information pushed last once EOF is detected on a file.

Several member functions are needed for the accomplishment of these tasks. As they are auxiliary to the switchSource() and popSource() functions, they are private members. These private members are developed in practice once the need for them arises. In the following interface of the Scanner class the final header file is given. Note that, apart from the private member functions, several private data members are used as well. These members are initialized in the constructor Scanner() and are used in the private memberfunctions. They are discussed below, in the context of the memberfunctions using them.

#include <FlexLexer.h> // provides yyFlexLexer interface #include <fstream.h> #include <stdio.h> #include <string.h> class Scanner: public yyFlexLexer { public: Scanner(istream *yyin); void switchSource(); int popSource(); private: int const sizeof_buffer = 16384; int const stackDepth = 10; int scanYYText(); // 1: nextSource contains new name void performSwitch(); void checkCircularity(); void checkDepth(); yy_buffer_state **state; char **fileName, *srcPtr, *nextSource; int stackTop; };

The switchSource() memberfunction should interpret the information given in yytext: it is interpreted by scanYYText(). If scanYYText() can extract a filename from yytext a switch to another file can be performed. This switch is performed by performSwitch(). If the filename could not be extracted, a message is written to the outputstream. Here is the code of switchSource():

#include "scanner.h" void Scanner::switchSource() { if (scanYYText()) performSwitch(); }

The memberfunction scanYYText() performs a simple scan of the information in yytext. If a name is detected following #include " that name is stored in the private data member nextSource, and 1 is returned. Otherwise, the information in yytext is copied to yyout, and 0 is returned. Here is the source for scanYYText():

#include "scanner.h" int Scanner::scanYYText() { delete nextSource; // build new buffer nextSource = new char[yyleng]; if ( sscanf(yytext, "#include %[^ \t\n>]", nextSource) != 1 || !(srcPtr = strchr(nextSource, '<')) ) { *yyout << yytext; // copy #include to yyout return (0); // scan failed } srcPtr++; return (1); }

The function performSwitch() performs the actual file-switching. The yyFlexLexer class provides a series of memberfunctions that can be used for file switching purposes. The file-switching capability of a yyFlexLexer object is founded on the struct yy_buffer_state, containing the state of the scan-buffer of the file that is currently scanned by the lexical scanner. This buffer is pushed on a stack when an #include is encountered, to be replaced with the buffer of the file that is mentioned in the #include directive.

The switching of the file to be scanned is realized in the following steps:

The sources for the memberfunctions performSwitch(), checkDepth(), and checkCircularity() is given next:

#include "scanner.h" void Scanner::performSwitch() { ++stackTop; checkDepth(); checkCircularity(); ifstream *newStream = new ifstream(srcPtr); if (!newStream) { cerr << "Can't open " << srcPtr << endl; exit(1); } state[stackTop] = yy_current_buffer; yy_switch_to_buffer(yy_create_buffer(newStream, sizeof_buffer)); }

#include "scanner.h" void Scanner::checkDepth() { if (stackTop == stackDepth) { cerr << "Inclusion level exceeded. Maximum is " << stackDepth << endl; exit (1); } }

#include "scanner.h" void Scanner::checkCircularity() { delete fileName[stackTop]; fileName[stackTop] = new char [strlen(srcPtr) + 1]; strcpy(fileName[stackTop], srcPtr); int index; for (index = 0; strcmp(srcPtr, fileName[index]); index++) ; if (index != stackTop) { cerr << "Circular inclusion of " << srcPtr << endl; while (stackTop > index) { cerr << fileName[stackTop] << " was included in " << fileName[stackTop - 1] << endl; --stackTop; } exit (1); } }

The memberfunction popSource() is called to pop the previously pushed sourcefile from the stack, to continue its scan just beyond the just processed #include directive. The popSource() function first inspects stackTop: if the variable is at least 0, then it's an index into the yy_buffer_state array, and thus the current buffer is deleted, to be replaced by the state waiting on top of the stack. This is realized by the yyFlexLexer members yy_delete_buffer and yy_switch_to_buffer.

If a previous buffer waited on top of the stack, then 1 is returned, indicating a successful switch to the previously pushed file. If the stack was empty, 0 is returned, and the lexer will terminate.

Here is the source of the function popSource():

#include "scanner.h" int Scanner::popSource() { if (stackTop >= 0) { yy_delete_buffer(yy_current_buffer); yy_switch_to_buffer(state[stackTop]); stackTop--; return (1); } return (0); }

These functions complete the implementation of the complete lexical scanner: the lexical scanner itself is stored in the yyFlexLexer part of the Scanner object. The Scanner object itself only has two public memberfunctions: one to push a sourcefile on a stack when a switch to the next sourcefile is requested, the other one restores the previously pushed source.

Finally, the constructor will initialize the Scanner object. Note that the interface contains an overloaded assignment operator and a copy constructor. By mentioning these two functions in the interface only, without implementing them, they cannot be used in a program: the linking phase of a program using such functions would fail. In this case this is intended behavior: the Scanner object does its own job, and there simply is no need for the assignment of a Scanner object to another one, or for the duplication of a Scanner object.

The constructor itself is a simple piece of code. Here is its source:

#include "scanner.h" Scanner::Scanner(istream *yyin) { switch_streams(yyin); state = new yy_buffer_state * [stackDepth]; memset(state, 0, stackDepth * sizeof(yy_buffer_state *)); fileName = new char * [stackDepth]; memset(fileName, 0, stackDepth * sizeof(char *)); nextSource = 0; stackTop = -1; }

, but should not be used.

15.3.1.3: The main() function

The main program is a very simple one. As the program expects a filename to start the scanning process at, initially the number of arguments is checked. If at least one argument was given, then a ifstream object is created. If this object can be created, then a Scanner object is created, receiving the address of the ifstream object as its argument. Then the yylex() member function of the Scanner object is called. This function is inherited from the Scanner's base class yyFlexLexer.

Here is the source-text of the main function:

/* lexer.cc A C++ main()-frame generated by C++ for lexer.cc */ #include "lexer.h" /* program header file */ int main(int argc, char **argv) { if (argc == 1) { cerr << "Filename argument required\n"; exit (1); } ifstream yyin(argv[1]); if (!yyin) { cerr << "Can't read " << argv[1] << endl; exit(1); } Scanner scanner(&yyin); scanner.yylex(); return (0); }

15.3.1.4: Building the scanner-program

The final program is constructed in two steps. These steps are given for a unix system, on which flex++ and the Gnu C++ compiler g++ have been installed:

15.3.2: Using both bison++ and flex++

When the input language exceeds a certain level of complexity, a parser is generally needed to control the complexity of the input language. In these cases, a parser generator is used to generate the code that's required to determine the grammatical correctness of the input language. The function of the scanner is to provided chunks of the input, called tokens, for the parser to work with.

Starting point for a program using both a parser and a scanner is the grammar: the grammar is specified first. This results in a set of tokens which can be returned by the lexical scanner (commonly called the lexer. Finally, auxiliary code is provided to fill in the blanks: the actions which are performed by the parser and the lexer are not normally specified with the grammatical rules or lexical regular expressions, but are executed by functions, which are called from within the parser's rules or associated with the lexer's regular expressions.

In the previous section we've seen an example of a C++ class generated by flex++. In the current section the parser is our main concern. The parser can be generated from a grammar specified for the program bison++. The specification of bison++ is similar to the specifications required for bison, but a class is generated, rather than a single function. In the next sections we'll develop a program converting infix expressions, in which binary operators are written between their operands, to postfix expressions, in which binary operators are written following their operands. A comparable situation holds true for the unary operators - and +: We can ignore the + operator, but the - is converted to a unary minus.

Our calculator will recognize a minimal set of operators: multiplication, addition, parentheses, and the unary minus. We'll distinguish real numbers from integers, to illustrate a subtlety in the bison-like grammar specifications, but that's about it: the purpose of this section, after all, is to illustrate a C++ program, using a parser and a lexer, and not to construct a full-fledged calculator.

In the next few sections we'll start developing the grammar in a bison++ specification file. Then, the regular expressions for the scanner are specified according to the requirements of flex++. Finally the program is constructed.

15.3.2.1: The bison++ specification file

The bison specification file as used with bison is comparable to the specification file as used with bison++. Differences are related to the class nature of the resulting parser. The calculator will distinguish real numbers from ints, and will support the basic set of arithmetic operators.

The bison++ specification file contains the following sections:

15.3.2.2: The bison++ token section

The token section contains all the tokens that are used in the grammar, as well as the priority rules as used for the mathematical operators. However, several extra items can be declared here:

15.3.2.3: The bison++ grammar rules

The rules and actions of the grammar are specified as usual. The grammar for our little calculator is given below. A lot of rules, but they illustrate the use of nonterminals associated with value-types.

lines: lines line | line ; line: intExpr '\n' { cerr << "int: " << $1 << endl; } | doubleExpr '\n' { cerr << "double: " << $1 << endl; } | '\n' { cout << "Good bye\n"; YYACCEPT; } | error '\n' ; intExpr: intExpr '*' intExpr { $$ = $1 * $3; } | intExpr '+' intExpr { $$ = $1 + $3; } | '(' intExpr ')' { $$ = $2; } | '-' intExpr %prec UnaryMinus { $$ = -$2; } | INT ; doubleExpr: doubleExpr '*' doubleExpr { $$ = $1 * $3; } | doubleExpr '+' doubleExpr { $$ = $1 + $3; } | doubleExpr '*' intExpr { $$ = $1 * $3; } | doubleExpr '+' intExpr { $$ = $1 + $3; } | intExpr '*' doubleExpr { $$ = $1 * $3; } | intExpr '+' doubleExpr { $$ = $1 + $3; } | '(' doubleExpr ')' { $$ = $2; } | '-' doubleExpr %prec UnaryMinus { $$ = -$2; } | DOUBLE ; With these rules a very simple calculator is defined in which integer and real values can be negated, added, and multiplied, and in which standard priority rules can be circumvented using parentheses. The rules show the use of typed nonterminal symbols: doubleExpr is linked to real (double) values, intExpr is linked to integer values. Precedence and type association is defined in the token section of the parser specification file, which is: %name Parser %union { int i; double d; }; %token <i> INT %token <d> DOUBLE %type <i> intExpr %type <d> doubleExpr %left '+' %left '*' %right UnaryMinus %define LEX_BODY {return lexer.yylex();} %define ERROR_BODY { cerr << "error encountered\n"; } In the token section we see the use of the %type specifiers, connecting intExpr to the i-field of the semantic-value union, and connecting doubleExpr to the d-field. At first sight it looks a bit complex, since the expression rules must be included for each individual returntype. On the other hand, if the union itself would have been used, we would have had to specify somewhere in the returned semantic values what field to use: less rules, but more complex and error-prone code.

subsubsection(The flex++ specification file) The flex-specification file to be used with our little calculator is simple: blanks are skipped, single characters are returned, and numerical values are returned as either Parser::INT or Parser::DOUBLE values. Here is the complete flex++ specification file:

%{ #include <iostream.h> #include "parser.h" extern yyFlexLexer lexer; extern Parser parser; %} %% [ \t] ; [0-9]+ { parser.yylval.i = atoi(yytext); return(Parser::INT); } "."[0-9]* | [0-9]+("."[0-9]*)? { parser.yylval.d = atof(yytext); return(Parser::DOUBLE); } .|\n return (*yytext);

subsubsection(The generation of the code) The code is generated in the same way as with bison and flex. To order bison++ to generate the files parser.cc and parser.h, the command

bison++ -d -o parser.cc parser
can be given.

Flex++ will thereupon generate code on lexer.cc using the command

flex++ -I -olexer.cc lexer
Note here that flex++ expects no blanks between the -o flag and lexer.cc.

On unix, linking and compiling the generated sources and the source for the main program (listed below) is realized with the following command:

g++ -o calc -Wall *.cc -lfl -s
Note the fact that the libfl.a library is mentioned here. If it's not mentioned unresolved functions like yywrap() emerge.

A source in which the main() function, the lexical scanner and the parser objects are defined is, finally:

#include "parser.h" Parser parser; yyFlexLexer lexer; int main() { return (parser.yyparse()); }