C++ Annotations Version 4.0.0

Chapter 15: Concrete examples of C++

We're always interested in getting feedback. E-mail us if you like this guide, if you think that important material is omitted, if you encounter errors in the code examples or in the documentation, if you find any typos, or generally just if you feel like e-mailing. Mail to Frank Brokken or use an e-mail form. Please state the concerned document version, found in the title. If you're interested in a printable PostScript copy, use the form, or better yet, pick up your own copy by ftp from ftp.icce.rug.nl/pub/http.

This chapter presents a number of concrete examples of programming in C++. Items from this document such as virtual functions, static members, etc. are rediscussed. Examples of container classes are shown.

Another example digs into the peculiarities of using a parser- and scanner-generator with C++. Once the input for a program exceeds a certain level of complexity, it's advantageous to use a scanner- and parser-generator for creating the code which does the actual input recognition. The example describes the usage of these tool in a C++ environment.

15.1: Storing objects: Storable and Storage

A reoccurring task of many programs is the storage of data, which are then sorted, selected, etc.. Storing data can be as simple as maintaining an array of ints, but can also be much more complex, such as maintaining file system information by the kernel of an operating system.

In this section we take a closer look at the storage of generic objects in memory (i.e., during the execution of a program). Conforming to the object-oriented recipe we shall develop two classes: a class Storage, which stores objects, and a class Storable, the prototype of objects which can be stored.

15.1.1: The global setup

As far as the functionality of the class Storage is concerned, objects can be added to the storage and objects can be obtained from the storage. Also it must be possible to obtain the number of objects in the storage.

As far as the internal data organization of the storage is concerned, we opt for an approach in which Storage maintains an array which can be reallocated, consisting of pointers to the stored objects.

The internal organization of the class Storage is illustrated in figure 11.

figure 11: Internal organization of the class Storage.

15.1.1.1: Interface functions of the class Storage

The usage (interface) of the class Storage is contained in three member functions. The following list describes these member functions and mentions the class Storable, more on this later.

The function add(Storable const *newobj) adds an object to the storage. The function reallocates the array of pointers to accommodate one more and inserts the address of the object to store.

The function Storable const *get(int index) returns a pointer to the object which is stored at the index'th slot.

The function int nstored() returns the number of objects in the storage.

15.1.1.2: To copy or not to copy?

There are two distinct design alternatives for the function add(). These considerations address the choice whether the stored objects (the squares on the right side of figure 11) should be copies of the original objects, or the objects themselves.

In other words, should the function add() of the class Storage:

just store the address of the object which it receives as its argument in the array of pointers, or should it
make a copy of the object first, and store the address of the copy?

These considerations are not trivial. Consider the following example:

Storage store; Storable something; store.add(something); // add to storage // let's assume that Storable::modify() is defined something.modify(); // modify original object, Storable *retrieved = store.get(0); // retrieve from storage // NOW: is "*retrieved" equal to "something" ?!

If we choose to store (addresses of) the objects themselves, then at the end of the above code fragment, the object pointed to by retrieved will equal something. A manipulation of previously stored objects thereby alters the contents of the storage.

If we choose to store copies of objects, then obviously *retrieved will not equal something but will remain the original, unaltered, object. This approach has a great merit: objects can be placed into storage as a `safeguard', to be retrieved later when an original object was altered or even ceased to exist. In this implementation we therefore choose for this approach.

15.1.1.3: Who makes the copy?

The fact that copies of objects should be stored presents a small problem. If we want to keep the class Storage as universal as possible, then the making of a copy of a Storable object cannot occur here. The reason for this is that the actual type of the objects to store is not known in advance. A simplistic approach, such as the following:

void Storage::add(Storable const *obj) { Storable *to_store = new *obj; // now add to_store instead of obj . . }

shall not work. This code attempts to make a copy of obj by using the operator new, which in turn calls the copy constructor of Storable. However, if Storable is only a base class, and the class of the object to store is a derived class (say, a Person), how can the copy constructor of the class Storable create a copy of a Person?

The making of a copy therefore must lie with the actual class of the object to store, i.e., with the derived class. Such a class must have the functionality to create a duplicate of the object in question and to return a pointer to this duplicate. If we call this function duplicate() then the code of the adding function becomes:

void Storage::add(Storable const *obj) { Storable *to_store = obj->duplicate(); // now add to_store instead of obj . . }

The function duplicate() is called in this example by using a pointer to the original object (this is the pointer obj). The class Storable is in this example only a base class which defines a protocol, and not the class of the actual objects which will be stored. Ergo, the function duplicate() need not be defined in Storable, but must be concretely implemented in derived classes. In other words, duplicate() is a pure virtual function.

15.1.2: The class Storable

Using the above discussed approach we can now define the class Storable. The following questions are of importance:

Does the class Storable need a default constructor, or possibly other constructors such as a copy constructor?
The answer is no. Storable will be a bare prototype, from which other classes will be derived.
Does the class Storable need a destructor? Should this destructor be (pure) virtual?
Yes. The destructor will be called when, e.g., a Storage object ceases to exist. It is quite possible that classes which will be derived from Storable will have their own destructors: we should therefore define a virtual destructor, to ensure that when an object pointed to by a Storable* is deleted, the actual destructor of the derived class is called.
The destructor however should not be pure virtual. It is quite possible that the classes which will be derived from Storable will not need a destructor; in that case, an empty destructor function should be supplied.

The class definition and its functions are given below:

class Storable { public: virtual ~Storable(); virtual Storable *duplicate() const = 0; }; Storable::~Storable() { }

15.1.2.1: Converting an existing class to a Storable

To show how (existing) classes can be converted to derivation from a Storable, consider the below class Person from section 5. This class is re-created here, conforming to Storable's protocol (only the relevant or new code is shown):

class Person: public Storable { // copy constructor Person(Person const &other); // assignment Person const &operator=(Person const &other); // duplicator function Storable *duplicate() const; . . }

When implementing the function Person::duplicate() we can use either the copy constructor or the default constructor with the overloaded assignment operator. The implementation of duplicate() is quite simple:

// first version: Storable *Person::duplicate() const { // uses default constructor in new Person Person *dup = new Person; // uses overloaded assignment in *dup = *this *dup = *this; return (dup); } // second version: Storable *Person::duplicate() const { // uses copy constructor in new Person(*this) return (new Person(*this)); }

The above conversion from a class Person to the needs of a Storable supposes that the sources of Person are at hand and can be modified. However, even if the definition of a Person class is not available, but is e.g., contained in a run-time library, the conversion to the Storable format poses no difficulties:

class StorablePerson: public Person, public Storable { public: // duplicator function Storable *duplicate() const; }; Storable *StorablePerson::duplicate() const { return (new *(Person*)this); }

15.1.3: The class Storage

We can now implement the class Storage. The class definition is given below:

class Storage: public Storable { public: // destructors, constructor ~Storage(); Storage(); Storage(Storage const &other); // overloaded assignment Storage const &operator=(Storage const &other); // functionality to duplicate storages Storable *duplicate() const; // interface void add(Storable *newobj); int nstored() const; Storable *get(int index); private: // copy/destroy primitives void destroy(); void copy(Storage const &other); // private data int n; Storable **storage; };

Concerning the class definition we remark:

As its interface the class has the functions add(), get() and nstored(). These functions were previously discussed (see section 11).
The class has a copy constructor and an overloaded assignment function. These functions are needed because Storage contains a pointer, which addresses allocated memory.
Storage itself is derived from Storable, as can be seen in the classname definition and in the presence of the function duplicate(). This means that Storage objects can themselves be placed in a Storage, thereby creating `super-storages': say, a list of groups of Persons.
Internally, Storage defines two private functions copy() and destroy(). The purpose of these primitive functions is discussed in section 5.4.

The destructor, constructors and the overloaded assignment function are listed below:

// default constructor Storage::Storage() { n = 0; storage = 0; } // copy constructor Storage::Storage(Storage const &other) { copy(other); } // destructor Storage::~Storage() { destroy(); } // overloaded assignment Storage const &Storage::operator=(Storage const &other) { if (this != &other) { destroy(); copy(other); } return (*this); }

The primitive functions copy() and destroy() unconditionally copy another Storage object, or destroy the contents of the current one. Note that copy() calls duplicate() to duplicate the other's stored objects:

void Storage::copy(Storage const &other) { n = other.n; storage = new Storable* [n]; for (int i = 0; i < n; i++) storage [i] = other.storage [i]->duplicate(); } void Storage::destroy() { for (register int i = 0; i < n; i++) delete storage [i]; delete storage; }

The function duplicate(), which is required since Storage itself should be a Storable, uses the copy constructor to duplicate the current object:

Storable *Storage::duplicate() const { return (new *this); }

Finally, here are the interface functions which add objects to the storage, return them, or determine the number of stored objects ( Note: the function realloc() that is used in this section should actually not be used. A better procedure would be to create a C++ variant for the realloc() function. A modification is in the pipeline....)

void Storage::add(Storable const *newobj) { // reallocate storage array storage = (Storable **) realloc(storage, (n + 1) * sizeof(Storable *)); // put duplicate of newobj in storage storage [n] = newobj->duplicate(); // increase number of obj in storage n++; } Storable *Storage::get(int index) { // check if index within range if (index < 0 || index >= n) return (0); // return address of stored object return (storage [index]); } int Storage::nstored() const { return (n); }

15.2: A binary tree

This section shows an implementation of a binary tree in C++. Analogously to the classes Storage and Storable (see section 15) two separate classes are used: one to represent the tree itself, and one to represent the objects which are stored in the tree. The classes will be appropriately named Tree and Node.

15.2.1: The Node class

The class Node is an abstract (pure virtual) class, which defines the protocol for the usage of derived classes with a Tree. Concerning this protocol we remark the following:

When data are stored in a binary tree, the place of the data is determined by some order: it is necessary to determine how the objects should be sorted. This requires a comparison between objects. This comparison must inform the caller (i.e., the function which places objects in a tree) whether one object is `smaller' or `greater' than another object.

This comparison must lie with Nodes: a Tree itself cannot know how objects should be compared. Part of the procotol which is required by Node is therefore:

virtual int compare(Node const *other) const = 0;

The comparing function will have to be implemented in each derived class.

Similar to the storage of objects in the class Storage (see section 15), a binary tree will contain copies of objects. The responsibility to duplicate an object therefore also lies with Node, as defined in a pure virtual function:
virtual Node *duplicate() const = 0;
When processing a binary tree containing objects, the tree is recursively descended and a given operation is performed for each object. The operation depends of course on the actual type of the stored object. By declaring a pure virtual function
virtual void process() = 0;
in the class Node, the responsibility to process an object is placed with the derived class.
When an object is placed into storage in a binary tree, it can occur that the object has previously been stored. In that case the object will not be stored for a second time.
For these cases we define a virtual function already_stored(), which is however not pure virtual. The default implementation will take no action. The function can however be redefined in a derived class:
virtual void already_stored();

The complete definition and declaration of the class Node is given below:

class Node { public: // destructor virtual ~Node(); // duplicator virtual Node* duplicate() const = 0; // comparison of 2 objects virtual int compare(Node const *other) const = 0; // function to do whatever is needed to the node virtual void process() = 0; // called when object to add was already in the tree virtual void already_stored(); }; Node::~Node() { } void Node::already_stored() { }

15.2.2: The Tree class

The class Tree is responsible for the storage of objects which are derived from a Node. To implement the recursive tree structure, the class Tree has two private pointers as its data, pointing to subtrees: a Tree *left and Tree *right. The information which is contained in a node of the tree is represented as a private field Node *info.

To scan a binary tree, the class Tree offers three methods: preorder, inorder and postorder. When scanning in preorder first a leaf in a node is processed, then the left subtree is scanned and finally the right subtree is scanned. When scanning inorder first the left subtree is scanned, then the leaf itself is processed and finally the right subtree is scanned. When scanning in postorder first the left and right subtrees are scanned and then the leaf itself is processed.

The definition of the class Tree is given below:

class Tree { public: // destructor, constructors ~Tree(); Tree(); Tree(Tree const &other); // assignment Tree const &operator=(Tree const &other); // addition of a Node void add(Node *what); // processing order in the tree void preorder_walk(); void inorder_walk(); void postorder_walk(); private: // primitives void copy(Tree const &other); void destroy(); // data Tree *left, *right; Node *info; };

15.2.2.1: The `standard' functions

As can be seen from the class definition, Tree contains pointer fields. This means that the class will need a destructor, a copy constructor and an overloaded assignment function to ensure that no allocation problems occur.

The destructor, the copy constructor and the overloaded assignment function are implemented with two primitive operations copy() and destroy() (these functions are presented later):

// destructor: destroys the tree Tree::~Tree() { destroy(); } // default constructor: initializes to 0 Tree::Tree() { left = right = 0; info = 0; } // copy constructor: initializes to contents of other object Tree::Tree(Tree const &other) { copy(other); } // overloaded assignment Tree const &Tree::operator=(Tree const &other) { if (this != &other) { destroy(); copy(other); } return (*this); }

15.2.2.2: Adding an object to the tree

The addition of a new object to the tree is a recursive process. When the function add() is called to insert an object into the tree, there are basically only a few possibilities:

The info field of the current node can be a 0-pointer. In that case, a duplicate of the object to add is inserted in the current node.

When the tree is already partially filled, then it is necessary to determine whether the object to add should come `before' or `after' the object of the current node. This comparison is performed by compare(), a pure virtual function whose implementation is required by Node. Depending on the order the new object must be inserted in the left or in the right subtree. These subtrees may first have to be allocated.

When the comparison of the new object and the object of the current node yields `equality', then the new object should not be stored again in the tree. In this case, already_stored() is called.

The function add() is listed below:

void Tree::add(Node *what) { if (! info) info = what->duplicate(); else { register int cmp = info->compare(what); if (cmp < 0) { if (! left) { left = new Tree; left->info = what->duplicate(); } else left->add(what); } else if (cmp > 0) { if (! right) { right = new Tree; right->info = what->duplicate(); } else right->add(what); } else info->already_stored(); } }

15.2.2.3: Scanning the tree

The class Tree offers three methods of scanning a binary tree: preorder, inorder and postorder. The three functions which define these actions are recursive:

void Tree::preorder_walk() { if (info) info->process(); if (left) left->preorder_walk(); if (right) right->preorder_walk(); } void Tree::inorder_walk() { if (left) left->inorder_walk(); if (info) info->process(); if (right) right->inorder_walk(); } void Tree::postorder_walk() { if (left) left->postorder_walk(); if (right) right->postorder_walk(); if (info) info->process(); }

15.2.2.4: The primitive operations copy() and destroy()

The functions copy() and destroy() are two private member functions which implement primitive operations of the class Tree: the copying of the contents of another Tree or the destroying of the tree.

void Tree::destroy() { delete info; if (left) delete left; if (right) delete right; } void Tree::copy(Tree const &other) { info = other.info ? other.info->duplicate() : 0; left = other.left ? new Tree(*other.left) : 0; right = other.right ? new Tree(*other.right) : 0; }

Concerning this implementation we remark the following:

The function destroy() is recursive, even though this is not at once visible. A statement like delete left will activate the destructor for the Tree object which is pointed to by left; this in turn will call destroy() etc..
Similarly, the function copy() is recursive. The code <tt/left = new Tree(*other.left)/ activates the copy constructor, which in turn calls copy() for the left branch of the tree.
As is the case with the function add(), nodes themselves are duplicated with the function duplicate(). This function is supplied by a concrete implementation of a derived class of Node.

15.2.3: Using Tree and Node

We illustrate the usage of the classes Tree and Node with a program that counts words in files. Words are defined as series of characters, separated by white spaces. The program shows which words are present in which file, and how many times.

Below is the listing of a class Strnode. This class is derived from Node and implements the virtual functions. Note also how this class implements the counting of words; when a given word occurs more than one time, Tree will call the member function already_stored(). This function simply increases the private counter variable times.

class Strnode: public Node { public: // destructor, constructors ~Strnode(); Strnode(); Strnode(Strnode const &other); Strnode(char const *s); // assignment Strnode const &operator=(Strnode const &other); // functions required by Node protocol Node* duplicate() const; int compare(Node const *other) const; void process(); void already_stored(); private: // data char *str; int times; }; Strnode::~Strnode() { delete str; } Strnode::Strnode() { str = 0; times = 0; } Strnode::Strnode(Strnode const &other) { str = strdup(other.str); times = other.times; } Strnode::Strnode(char const *s) { str = strdup(s); times = 1; } Strnode const &Strnode::operator=(Strnode const &other) { if (this != &other) { delete str; str = strdup(other.str); times = other.times; } return (*this); } Node *Strnode::duplicate() const { return (new Strnode(*this)); } int Strnode::compare(Node const *other) const { Strnode *otherp = (Strnode *) other; if (str && otherp->str) return (strcmp(str, otherp->str)); if (! str && ! otherp->str) return (0); return ((int) otherp->str - (int) str ); } void Strnode::process() { if (str) printf("%4d\t%s\n", times, str); } void Strnode::already_stored() { times++; } void countfile(FILE *inf, char const *name) { Tree tree; char buf [255]; while (1) { fscanf(inf, " %s", buf); if (feof(inf)) break; Strnode *word = new Strnode(buf); tree.add(word); delete word; } tree.inorder_walk(); } int main(int argc, char **argv) { register int exitstatus = 0; if (argc > 1) for (register int i = 1; i < argc; i++) { FILE *inf = fopen(argv [i], "r"); if (! inf) { fprintf(stderr, "wordc: can't open \"%s\"\n", argv [i]); exitstatus++; } else { countfile(inf, argv [i]); fclose(inf); } } else countfile(stdin, "--stdin--"); return (exitstatus); }

15.3: Using Bison and Flex

The example discussed in this section digs into the peculiarities of using a parser- and scanner-generator with C++. Once the input for a program exceeds a certain level of complexity, it's advantageous to use a scanner- and parser-generator for creating the code which does the actual input recognition. The example about this topic assumes that the reader knows how to use the scanner generator flex and the parser generator bison. Both bison and flex are well documented elsewhere. The original predecessors of bison and flex, called yacc and lex are described in several books, e.g. in O'Reilly's book `lex & yacc'.

However, the scanner and parser generators are also (and maybe even more commonly, nowadays) available as free software. Both bison and flex can be obtained from prep.ai.mit.edu/pub/gnu. Flex will create a C++ class when called as flex++, or when the -+ flag is used. With bison the situation is a bit more complex. Scattered over the Internet several bison++ archives can be found (e.g., in rzbsdi01.uni-trier.de). The information in these archives usually dates back to 1993, irrespective of the version number mentioned with the archive itself. (However, the given ftp-archive also contains dos-executables, for those who are interested....)

Using flex++ and bison++ a class-based scanner and parser can be generated. The advantage of this approach is that the interface to the scanner and the parser tends to become a bit cleaner than without using the class interface.

Below two examples are given. In the first example only a lexical scanner is used to monitor the production of a file from several parts. This example focuses on the lexical scanner, and on switching files while churning through the parts. The second example uses both a scanner and a parser to transform standard arithmetic expressions to their postfix notation, commonly encountered in code generated by compilers and in HP-calculators. The second example focuses on the parser.

15.3.1: Using Flex++ to create a scanner

In this example a lexical scanner is used to monitor the production of a file from several parts. This example focuses on the lexical scanner, and on switching files while churning through the parts. The setup is as follows: The input-language knows of a #include statement, which is followed by a string indicating the file which should be included at the location of the #include.

In order to avoid complexities that have nothing to do with the current example, the format of the #include statement is #include <filepath>. The file specified between the pointed brackets should be available at the location indicated by filepath. If the file is not available, the program should terminate using a proper error message.

The program is started with one or two filename arguments. If the program is started with just one filename argument, the output is written to the standard output stream cout. Otherwise, the output is written to the stream whose name is given as the program's second argument.

The program uses a maximum nesting depth. Once the maximum is exceeded, the program terminates with an appropriate error message. In that case, the filenamestack indicating where which file was included should be printed.

One minor extra feature is that comment-lines should be recognized: include directives in comment-lines should be ignored, comment being the standard C++ comment-types.

The program is created in the following steps:

First, the file lexer is constructed, containing the specifications of the input-language.
From the specifications in lexer the requirements for the class Scanner evolve. The Scanner class is a wrapper around the class yyFlexLexer generated by flex++. The requirements results in the specification of the interface for the class Scanner.
Next, the main() function is constructed. A Startup object is created to inspect the commandline arguments. If successful, the scanner's member yylex() is called to construct the output file.
Now that the global setup of the program has been specified, the memberfunctions of the different classes are constructed.
Finally, the program is compiled and linked.

15.3.1.1: The flex++ specification file

The organization of the lexical scanner specification file is similar to the one used with flex. However, flex++ now creates a class (yyFlexLexer) from which the class Scanner will be derived.

The code associated with the rules will be located inside the class yyFlexLexer. If it would be handy to access the member-functions of the derived class within that code, it must be realized that the this pointer is a base-class pointer. In order to reach the derived class, the this pointer must be cast to a pointer to the derived class.

Looking at the rules, notice that we'll need rules for the recognition of the comment, for the include directive, and for the remaining characters. This is all fairly standard practice. When an include directive is detected, the derived-class' member function switchSource() is called, which will perform the required file switching. When EOF is detected, the derived class' member function popSource() is called, which will pop the previous previously pushed file, returning 1. Once the file-stack is empty, the function will return 0, resulting in the call of yyterminate(), which will terminate the scanner.

The lexical scanner specification file has three sections: a C++ preamble, containing code which can be used in the code defining the actions to be performed once a regular expression is matched, a Flex++ symbol area, which is used for the definition of symbols, like a mini scanner, or options, like %option yylineno when the lexical scanner should keep track of the line numbers of the files it is scanning and, finally a rules section, in which the regular expressions and their actions are given. In the current example, the lexer should mainly copy information from the istream *yyin to the ostream *yyout, for which the predefined macro ECHO can be used.

Here is the complete and annotated lexical scanner specification file to be used with flex++:

Since the derived class is able to access the information stored within the lexical scanner itself (it can even access the information directly, since the data members of yyFlexLexer are protected, and thus accessible to derived classes), very much processing can be left to the derived class. This results in a very clean setup of the lexer specification file, in which hardly any code is required in the preamble.

15.3.1.2: The derived class: Scanner

The class Scanner is derived from the class yyFlexLexer, generated by flex++. The derived class has access to the data controlled by the lexical scanner. In particular, the derived class has access to the following data members:

char *yytext: contains the text matched by a regular expression

int yyleng: the length of the text in yytext

int yylineno: the current line number (only if %option yylineo was specified in the lexer specfication file)

Other members are available as well, but they are less often used in our experience. Details can be found in the file FlexLexer.h, which is part of the flex distribution.

The class Scanner has to perform two tasks: It should push file information about the current file to a filestack, and should pop the information pushed last once EOF is detected on a file.

Several member functions are needed for the accomplishment of these tasks. As they are auxiliary to the switchSource() and popSource() functions, they are private members. These private members are developed in practice once the need for them arises. In the following interface of the Scanner class the final header file is given. Note that, apart from the private member functions, several private data members are used as well. These members are initialized in the constructor Scanner() and are used in the private memberfunctions. They are discussed below, in the context of the memberfunctions using them.

#include <FlexLexer.h> // provides yyFlexLexer interface #include <fstream.h> #include <stdio.h> #include <string.h> class Scanner: public yyFlexLexer { public: Scanner(istream *yyin); void switchSource(); int popSource(); private: int const sizeof_buffer = 16384; int const stackDepth = 10; int scanYYText(); // 1: nextSource contains new name void performSwitch(); void checkCircularity(); void checkDepth(); yy_buffer_state **state; char **fileName, *srcPtr, *nextSource; int stackTop; };

The switchSource() memberfunction should interpret the information given in yytext: it is interpreted by scanYYText(). If scanYYText() can extract a filename from yytext a switch to another file can be performed. This switch is performed by performSwitch(). If the filename could not be extracted, a message is written to the outputstream. Here is the code of switchSource():

#include "scanner.h" void Scanner::switchSource() { if (scanYYText()) performSwitch(); }

The memberfunction scanYYText() performs a simple scan of the information in yytext. If a name is detected following #include " that name is stored in the private data member nextSource, and 1 is returned. Otherwise, the information in yytext is copied to yyout, and 0 is returned. Here is the source for scanYYText():

#include "scanner.h" int Scanner::scanYYText() { delete nextSource; // build new buffer nextSource = new char[yyleng]; if ( sscanf(yytext, "#include %[^ \t\n>]", nextSource) != 1 || !(srcPtr = strchr(nextSource, '<')) ) { *yyout << yytext; // copy #include to yyout return (0); // scan failed } srcPtr++; return (1); }

The function performSwitch() performs the actual file-switching. The yyFlexLexer class provides a series of memberfunctions that can be used for file switching purposes. The file-switching capability of a yyFlexLexer object is founded on the struct yy_buffer_state, containing the state of the scan-buffer of the file that is currently scanned by the lexical scanner. This buffer is pushed on a stack when an #include is encountered, to be replaced with the buffer of the file that is mentioned in the #include directive.

The switching of the file to be scanned is realized in the following steps:

First, the current depth of the include-nesting is inspected. If the stackDetph is reached, the stack is full, and the program aborts with an appropriate message. For this the memberfunction checkDepth() is called.
Next, the fileName stack is inspected, to avoid circular inclusions. If nextSource is encountered in the fileName array, the inclusion is refused, and the program terminates with an appropriate message. The memberfunction checkCircularity() is called for this task.
Then, a new ifstream object is created, assigned to nextSource. If this fails, the program terminates with an appropriate message.
Finally, a new yy_buffer_state is created for the newly opened stream, and the lexical scanner is instructed to switch to that stream using yyFlexLexer's memberfunction yy_switch_to_buffer.

The sources for the memberfunctions performSwitch(), checkDepth(), and checkCircularity() is given next:

#include "scanner.h" void Scanner::performSwitch() { ++stackTop; checkDepth(); checkCircularity(); ifstream *newStream = new ifstream(srcPtr); if (!newStream) { cerr << "Can't open " << srcPtr << endl; exit(1); } state[stackTop] = yy_current_buffer; yy_switch_to_buffer(yy_create_buffer(newStream, sizeof_buffer)); }

#include "scanner.h" void Scanner::checkDepth() { if (stackTop == stackDepth) { cerr << "Inclusion level exceeded. Maximum is " << stackDepth << endl; exit (1); } }

#include "scanner.h" void Scanner::checkCircularity() { delete fileName[stackTop]; fileName[stackTop] = new char [strlen(srcPtr) + 1]; strcpy(fileName[stackTop], srcPtr); int index; for (index = 0; strcmp(srcPtr, fileName[index]); index++) ; if (index != stackTop) { cerr << "Circular inclusion of " << srcPtr << endl; while (stackTop > index) { cerr << fileName[stackTop] << " was included in " << fileName[stackTop - 1] << endl; --stackTop; } exit (1); } }

The memberfunction popSource() is called to pop the previously pushed sourcefile from the stack, to continue its scan just beyond the just processed #include directive. The popSource() function first inspects stackTop: if the variable is at least 0, then it's an index into the yy_buffer_state array, and thus the current buffer is deleted, to be replaced by the state waiting on top of the stack. This is realized by the yyFlexLexer members yy_delete_buffer and yy_switch_to_buffer.

If a previous buffer waited on top of the stack, then 1 is returned, indicating a successful switch to the previously pushed file. If the stack was empty, 0 is returned, and the lexer will terminate.

Here is the source of the function popSource():

#include "scanner.h" int Scanner::popSource() { if (stackTop >= 0) { yy_delete_buffer(yy_current_buffer); yy_switch_to_buffer(state[stackTop]); stackTop--; return (1); } return (0); }

These functions complete the implementation of the complete lexical scanner: the lexical scanner itself is stored in the yyFlexLexer part of the Scanner object. The Scanner object itself only has two public memberfunctions: one to push a sourcefile on a stack when a switch to the next sourcefile is requested, the other one restores the previously pushed source.

Finally, the constructor will initialize the Scanner object. Note that the interface contains an overloaded assignment operator and a copy constructor. By mentioning these two functions in the interface only, without implementing them, they cannot be used in a program: the linking phase of a program using such functions would fail. In this case this is intended behavior: the Scanner object does its own job, and there simply is no need for the assignment of a Scanner object to another one, or for the duplication of a Scanner object.

The constructor itself is a simple piece of code. Here is its source:

#include "scanner.h" Scanner::Scanner(istream *yyin) { switch_streams(yyin); state = new yy_buffer_state * [stackDepth]; memset(state, 0, stackDepth * sizeof(yy_buffer_state *)); fileName = new char * [stackDepth]; memset(fileName, 0, stackDepth * sizeof(char *)); nextSource = 0; stackTop = -1; }

, but should not be used.

15.3.1.3: The main() function

The main program is a very simple one. As the program expects a filename to start the scanning process at, initially the number of arguments is checked. If at least one argument was given, then a ifstream object is created. If this object can be created, then a Scanner object is created, receiving the address of the ifstream object as its argument. Then the yylex() member function of the Scanner object is called. This function is inherited from the Scanner's base class yyFlexLexer.

Here is the source-text of the main function:

/* lexer.cc A C++ main()-frame generated by C++ for lexer.cc */ #include "lexer.h" /* program header file */ int main(int argc, char **argv) { if (argc == 1) { cerr << "Filename argument required\n"; exit (1); } ifstream yyin(argv[1]); if (!yyin) { cerr << "Can't read " << argv[1] << endl; exit(1); } Scanner scanner(&yyin); scanner.yylex(); return (0); }

15.3.1.4: Building the scanner-program

The final program is constructed in two steps. These steps are given for a unix system, on which flex++ and the Gnu C++ compiler g++ have been installed:

First, the lexical scanner's source is created using flex++. For this the command flex++ lexer can be given.

Next, all sources are compiled and linked, using the libfl.a library. The appropriate command here is g++ -o scanner *.cc -lfl

15.3.2: Using both bison++ and flex++

When the input language exceeds a certain level of complexity, a parser is generally needed to control the complexity of the input language. In these cases, a parser generator is used to generate the code that's required to determine the grammatical correctness of the input language. The function of the scanner is to provided chunks of the input, called tokens, for the parser to work with.

Starting point for a program using both a parser and a scanner is the grammar: the grammar is specified first. This results in a set of tokens which can be returned by the lexical scanner (commonly called the lexer. Finally, auxiliary code is provided to fill in the blanks: the actions which are performed by the parser and the lexer are not normally specified with the grammatical rules or lexical regular expressions, but are executed by functions, which are called from within the parser's rules or associated with the lexer's regular expressions.

In the previous section we've seen an example of a C++ class generated by flex++. In the current section the parser is our main concern. The parser can be generated from a grammar specified for the program bison++. The specification of bison++ is similar to the specifications required for bison, but a class is generated, rather than a single function. In the next sections we'll develop a program converting infix expressions, in which binary operators are written between their operands, to postfix expressions, in which binary operators are written following their operands. A comparable situation holds true for the unary operators - and +: We can ignore the + operator, but the - is converted to a unary minus.

Our calculator will recognize a minimal set of operators: multiplication, addition, parentheses, and the unary minus. We'll distinguish real numbers from integers, to illustrate a subtlety in the bison-like grammar specifications, but that's about it: the purpose of this section, after all, is to illustrate a C++ program, using a parser and a lexer, and not to construct a full-fledged calculator.

In the next few sections we'll start developing the grammar in a bison++ specification file. Then, the regular expressions for the scanner are specified according to the requirements of flex++. Finally the program is constructed.

15.3.2.1: The bison++ specification file

The bison specification file as used with bison is comparable to the specification file as used with bison++. Differences are related to the class nature of the resulting parser. The calculator will distinguish real numbers from ints, and will support the basic set of arithmetic operators.

The bison++ specification file contains the following sections:

The header section. This section is comparable to the C specification section used with bison. The difference being the %header{ opening. In this section we'll encounter mainly declarations: header files are included, and the yyFlexLexer object is declared.

The token section. In this section the bison tokens, and the priority rules for the operators are declared. However, bison++ has several extra items that can be declared here. They are important and warrant a section of their own.

The rules. The grammatical rules define the grammar. This section has not changed since the bison program.

15.3.2.2: The bison++ token section

The token section contains all the tokens that are used in the grammar, as well as the priority rules as used for the mathematical operators. However, several extra items can be declared here:

%name ParserName. The name ParserName will be the name of the parser's class. This entry should be the first entry of the token-section. It is used in cases where multiple grammars are used, to make sure that the different parser-classes use unique identifiers. By default the name parse is used.

%define name content. The %define has the same function as the #define statement for the C++ preprocessor. It can be used to define, e.g., a macro. Internally, the defined symbol will be the concatenation of YY_, the parser's classname, and the name of the macro. E.g., YY_ParserName_name Several symbols will normally be defined here. Normally the definition of the body of the lexer-macro as called by the parser must be defined, and normally the body of the error-macro must be defined. Specifically, the following symbols are recognized by bison++, and can be redefined in this section:

%define DEBUG 1: if non-0 debugging code will be included in the parser's source.
%define ERROR_VERBOSE: if defined, the parser's stack will be dumped when an error occurs.
%define LVAL yylval: the default variable name is shown here: the variable name containing the parser's semantic value is by default yylval, but its name may be redefined here.
%define INHERIT :public ClassA, public ClassB: the inheritance list for the parser's class. Note that it starts with the ':' character. The define should be left out if the parser's class isn't derived from another class.
%define MEMBERS member-prototypes: if the parser should contain extra members, they must be declared here. Note that there is only one %define MEMBERS definition allowed. So, if multiple members are to be declared, they must all be declared at this point. To prevent very long lines in the specification file, the \ can be used at the end of a line, to indicate that it continues on the next line of the source-text. E.g., %define MEMBERS void lookup(); void lookdown();
%define LEX_BODY inline-code: here the body of the call to the lexer is defined. It can be defined as = 0 for an abstract parser-class, but otherwise it will contain the code representing the call to the lexer. For example, if the lexer object generated by flex++ is called lexer, this declaration should be %define LEX_BODY {return lexer.yylex();}
%define ERROR_BODY inline-code: similarly, the body of the code of the call to the error-function can be defined here. It can be defined as = 0, in which case the parser's class will again become abstract. Otherwise, it can be used to specify the inner workings of the error function. E.g., %define ERROR_BODY { cerr << "syntax Error\n"; }
Constructor-related defines: When a special parser constructor is needed, then three %defines can be used:
- %define CONSTRUCTOR_PARAM (parameterlist): this defines the parameterlist for the parser's constructor.
- %define CONSTRUCTOR_INIT :initializer(s): this defines the base-class and member initializers for the constructor.
- %define CONSTRUCTOR_CODE { code }: this defines the code of the parser's constructor.
When the parser doesn't need special effects, a constructor will not be needed. In those cases the parser can be created as follows (using the default parser-name): parse parser;

%union. This starts the definition of the semantical value union. It replaces the #define YYSTYPE definition seen with bison. An example of a %union declaration is %union { int i; double d; };

Associating tokens and unionfields. Tokens can be associated with unionfields. By doing so, the parser's actions-code becomes much cleaner than if the tokens aren't associated with fields. Moreover, nonterminals can also be associated with unionfields. In these cases the generic returnvariable $$ or the generic returnvalues

$1,
$2

, etc, that are associated with components of rules can be used, rather than $$.i, $3.d, etc. In order to associate a nonterminal or a token with a unionfield, the <fieldname> specification is used. E.g., %token INT %token <d> DOUBLE %type intExpr In this example, note that both the tokens and the nonterminals can be associated with a field of the union. This will be further illustrated in the upcoming description of the rules of the grammar.

In the %union discussion the %token and %type specifications should be noted. They are used for the specficiation of the tokens (terminal symbols) that can be returned by the lexical scanner, and for the specification of the returntypes of nonterminals. Apart from %token the token-indicators %left, %right and %nonassoc may be used to specify the associativity of operators. The token(s) mentioned at these indicators are interpreted as tokens indicating operators, associating in the indicated direction. The precedence of operators is given by their order: the first specification has the lowest precedence. To overrule a certain precedence in a certain context, %prec can be used. As all this is standard bison practice, it isn't further discussed in this context. The documentation provided with the bison distribution should be consulted for further reference.

15.3.2.3: The bison++ grammar rules

The rules and actions of the grammar are specified as usual. The grammar for our little calculator is given below. A lot of rules, but they illustrate the use of nonterminals associated with value-types.

lines: lines line | line ; line: intExpr '\n' { cerr << "int: " << $1 << endl; } | doubleExpr '\n' { cerr << "double: " << $1 << endl; } | '\n' { cout << "Good bye\n"; YYACCEPT; } | error '\n' ; intExpr: intExpr '*' intExpr { $$ = $1 * $3; } | intExpr '+' intExpr { $$ = $1 + $3; } | '(' intExpr ')' { $$ = $2; } | '-' intExpr %prec UnaryMinus { $$ = -$2; } | INT ; doubleExpr: doubleExpr '*' doubleExpr { $$ = $1 * $3; } | doubleExpr '+' doubleExpr { $$ = $1 + $3; } | doubleExpr '*' intExpr { $$ = $1 * $3; } | doubleExpr '+' intExpr { $$ = $1 + $3; } | intExpr '*' doubleExpr { $$ = $1 * $3; } | intExpr '+' doubleExpr { $$ = $1 + $3; } | '(' doubleExpr ')' { $$ = $2; } | '-' doubleExpr %prec UnaryMinus { $$ = -$2; } | DOUBLE ; With these rules a very simple calculator is defined in which integer and real values can be negated, added, and multiplied, and in which standard priority rules can be circumvented using parentheses. The rules show the use of typed nonterminal symbols: doubleExpr is linked to real (double) values, intExpr is linked to integer values. Precedence and type association is defined in the token section of the parser specification file, which is: %name Parser %union { int i; double d; }; %token INT %token <d> DOUBLE %type intExpr %type <d> doubleExpr %left '+' %left '*' %right UnaryMinus %define LEX_BODY {return lexer.yylex();} %define ERROR_BODY { cerr << "error encountered\n"; } In the token section we see the use of the %type specifiers, connecting intExpr to the i-field of the semantic-value union, and connecting doubleExpr to the d-field. At first sight it looks a bit complex, since the expression rules must be included for each individual returntype. On the other hand, if the union itself would have been used, we would have had to specify somewhere in the returned semantic values what field to use: less rules, but more complex and error-prone code.

subsubsection(The flex++ specification file) The flex-specification file to be used with our little calculator is simple: blanks are skipped, single characters are returned, and numerical values are returned as either Parser::INT or Parser::DOUBLE values. Here is the complete flex++ specification file:

%{ #include <iostream.h> #include "parser.h" extern yyFlexLexer lexer; extern Parser parser; %} %% [ \t] ; [0-9]+ { parser.yylval.i = atoi(yytext); return(Parser::INT); } "."[0-9]* | [0-9]+("."[0-9]*)? { parser.yylval.d = atof(yytext); return(Parser::DOUBLE); } .|\n return (*yytext);

subsubsection(The generation of the code) The code is generated in the same way as with bison and flex. To order bison++ to generate the files parser.cc and parser.h, the command

bison++ -d -o parser.cc parser can be given.

Flex++ will thereupon generate code on lexer.cc using the command

flex++ -I -olexer.cc lexer Note here that flex++ expects no blanks between the -o flag and lexer.cc.

On unix, linking and compiling the generated sources and the source for the main program (listed below) is realized with the following command:

g++ -o calc -Wall *.cc -lfl -s Note the fact that the libfl.a library is mentioned here. If it's not mentioned unresolved functions like yywrap() emerge.

A source in which the main() function, the lexical scanner and the parser objects are defined is, finally:

#include "parser.h" Parser parser; yyFlexLexer lexer; int main() { return (parser.yyparse()); }