Day 7: Complex Data Structures

In previous days, you saw how to create sorted lists and how to sort lists of data. ROBIN and many other programs require the capability to quickly find data based on a requested value. Although sorting the data is a good start, there are efficient ways to store your records so that retrieval time is optimized. Today you will learn

What hash tables are and how to use them
What binary trees are and how to use them
What the issues are with binary trees

Laying the Groundwork

This chapter explores a number of lists and other complex data structures. To avoid retyping the same code repeatedly, and to provide a good example of how professional programs are laid out, I'll start by defining some header files that will be used for the entire chapter.

Note: The programs in this book were designed with portability in mind. Nonetheless, your compiler may be a bit more of a stickler than mine, and you may get warnings (or perhaps errors!) when you compile some of the code in this book.

There are a few things to try if you encounter errors or warnings in this case. First, if your compiler complains about the use of the BOOL type, comment out the enumerated BOOL shown in the standard definition file that follows, and replace it with these lines:

    1:     typedef int BOOL;
    2:     const int FALSE = 0;
    3:     const int TRUE = 1;

Be sure to contact me on Interchange or CompuServe if the code simply doesn't work for you. It may be that a typo slipped in despite our careful scrutiny, and we may be able to straighten it out for you. With this code, and every other line of code you read or create, don't assume that it is 100 percent correct as it stands.

Standard Definitions

Listing 7.1 shows stdef.hpp, the file in which some common definitions will be kept, along with a few #include statements that will be used by all subsequent programs. Virtually every listing will include stdef.hpp (standard definition).

Listing 7.1 Using Stdef.hpp to Store Standard Definitions

    1:     // **************************************************
    2:     // PROGRAM:  Standard Definitions
    3:     // FILE:     stdef.hpp
    4:     // PURPOSE:  provide standard inclusions and definitions
    5:     // NOTES:
    6:     // AUTHOR:   Jesse Liberty (jl)
    7:     // REVISIONS: 10/23/94 1.0 jl  initial release
    8:     // **************************************************
    9:
    10:
    11:    #ifndef STDDEF_HPP                 // inclusion guards
    12:    #define STDDEF_HPP
    13:
    14:    enum BOOL { FALSE, TRUE };
    15:
    16:    #include <iostream.h>
    17:
    18:    #endif

Output:

None.

Analysis:

Stdef.hpp provides the standard definitions and inclusions that will be used in various other programs. Note the inclusions guards in lines 11 and 12 and ending in line 18. This is a standard way to prevent header files from being included twice in your program. On line 11, the definition STDEF_HPP is tested. If this file has not yet been included, this line will test true and the rest of the file will be included. If the file has not already been included, this line will test false and nothing until the endif in line 18 will be included; the file therefore will be skipped.

In line 12, STDEF_HPP is defined so that the next time through, this check will fail and the file will not be included a second time.

The List Class

Much of this chapter will use the sorted list template developed on day 3 and reproduced in listing 7.2. This template will be stored in a file named linklist.hpp.

Listing 7.2 The Sorted Linked List Template

    1:     // **************************************************
    2:     // PROGRAM:  Linked List Header
    3:     // FILE:     linklist.hpp
    4:     // PURPOSE:  provide sorted linked list template
    5:     // NOTES:
    6:     // AUTHOR:   Jesse Liberty (jl)
    7:     // REVISIONS: 10/23/94 1.0 jl  initial release
    8:     // **************************************************
    9:
    10:
    11:    #ifndef LINKLIST_HPP
    12:    #define LINKLIST_HPP
    13:
    14:     // **************** Node class ************
    15:      template <class T>
    16:      class Node
    17:      {
    18:      public:
    19:         Node (T*);
    20:         ~Node();
    21:         void InsertAfter(Node *);
    22:         Node * GetNext() const            { return itsNext; }
    23:         void SetNext(Node * next)          { itsNext = next; }
    24:         T & GetObject() const            { return *itsObject; }
    25:         int operator<(const Node &rhs) const;
    26:         BOOL operator==(const T& rhs) const;
    27:
    28:      private:
    29:         T * itsObject;
    30:         Node * itsNext;
    31:      };
    32:
    33:      // **************** Object List ************
    34:      template <class T>
    35:      class List
    36:      {
    37:      public:
    38:         List();
    39:         ~List();
    40:         long      GetCount() const { return itsCount; }
    41:         void       Insert(T &);
    42:         void       Iterate(void (T::*f)()const);
    43:         T &         operator[](long);
    44:         T *         FindObject(const T& target );
    45:
    46:      private:
    47:         Node<T>  itsHead;
    48:         long itsCount;
    49:     };
    50:
    51:      // *** node implementations ****
    52:
    53:        template <class T>
    54:        Node<T>::Node(T * pObject):
    55:        itsObject(pObject),
    56:        itsNext(0)
    57:        {}
    58:
    59:         template <class T>
    60:         Node<T>::~Node()
    61:        {
    62:           delete itsObject;
    63:           itsObject = 0;
    64:           delete itsNext;
    65:           itsNext = 0;
    66:        }
    67:
    68:         template <class T>
    69:         void Node<T>::InsertAfter(Node* newNode)
    70:        {
    71:         newNode->SetNext(itsNext);
    72:         itsNext=newNode;
    73:        }
    74:
    75:        template <class T>
    76:        int Node<T>::operator<(const Node &rhs) const
    77:        {
    78:        return (*itsObject < rhs.GetObject());
    79:        }
    80:
    81:        template <class T>
    82:        BOOL Node<T>::operator==(const T& target) const
    83:        {
    84:           return (*itsObject == target);
    85:        }
    86:
    87:         // Implementations for Lists...
    88:
    89:        template<class T>
    90:        List <T>::List():
    91:           itsCount(0),
    92:           itsHead(0)  // initialize head node to have no Object
    93:           {}
    94:
    95:        template<class T>
    96:        List <T>::~List()
    97:        {
    98:        }
    99:
    100:       template<class T>
    101:       T &  List<T>::operator[](long offSet)
    102:       {
    103:          if (offSet+1 > itsCount)
    104:             return itsHead.GetObject(); // error
    105:
    106:          Node<T>* pNode = itsHead.GetNext();
    107:
    108:          for (long i=0;i<offSet; i++)
    109:             pNode = pNode->GetNext();
    110:
    111:         return   pNode->GetObject();
    112:       }
    113:
    114:       template<class T>
    115:       T*  List<T>::FindObject(const T& target )
    116:       {
    117:          for (Node<T> * pNode = itsHead.GetNext();
    118:                pNode!=NULL;
    119:                pNode = pNode->GetNext()
    120:               )
    121:          {
    122:             if ( *pNode == target)
    123:                break;
    124:          }
    125:          if (pNode == NULL)
    126:             return 0;
    127:          else
    128:             return &(pNode->GetObject());
    129:       }
    130:
    131:       template<class T>
    132:       void List<T>::Insert(T & Object)
    133:       {
    134:           Node<T> * NewNode = new Node<T>(&Object);
    135:
    136:           for (Node<T> * pNode = &itsHead;;pNode = pNode->GetNext())
    137:           {
    138:              if (pNode->GetNext() == NULL || *NewNode < *(pNode->GetNext() ))
    139:              {
    140:                 pNode->InsertAfter(NewNode);
    141:                 itsCount++;
    142:                 break;
    143:              }
    144:           }
    145:       }
    146:
    147:      template<class T>
    148:      void List<T>::Iterate(void (T::*func)()const)
    149:      {
    150:         for (Node<T>* pNode = itsHead.GetNext();
    151:              pNode;
    152:              pNode=pNode->GetNext()
    153:             )
    154:               (pNode->GetObject().*func)();
    155:      }
    156:
    157:   #endif

Output:

None.

Analysis:

This code is very similar to the code used on day 3. The significant change is that it has been made into a header file, complete with a heading of its own and inclusion guards. Save this file to your disk as linklist.hpp.

Strings and Notes

The minimal rNote object as described on day 3 also will be used, along with the String class previously described. The rNote declaration is provided in note.hpp, and the String declaration is provided in string.hpp. The String implementation is provided in string.cpp. Listing 7.3 illustrates note.hpp, listing 7.4 shows string.hpp, and listing 7.5 has string.cpp.

Listing 7.3 Note.hpp

    1:     // **************************************************
    2:     // PROGRAM:  Basic Note object
    3:     // FILE:     note.hpp
    4:     // PURPOSE:  provide simple note object
    5:     // NOTES:
    6:     // AUTHOR:   Jesse Liberty (jl)
    7:     // REVISIONS: 10/23/94 1.0 jl  initial release
    8:     // **************************************************
    9:
    10:    #ifndef NOTE_HPP
    11:    #define NOTE_HPP
    12:
    13:    #include "stdef.hpp"
    14:
    15:     class rNote
    16:    {
    17:     public:
    18:        rNote(const String& text):
    19:        itsText(text), itsDate(0L)
    20:           {itsNoteNumber = theNoteNumber++;}
    21:        ~rNote(){}
    22:
    23:        const String& GetText()const { return itsText; }
    24:        long GetDate() const { return itsDate; }
    25:        long GetNoteNumber() const { return itsNoteNumber; }
    26:
    27:        int operator<(const rNote& rhs)
    28:         { return itsNoteNumber < rhs.GetNoteNumber(); }
    29:        BOOL operator==(const rNote& rhs)
    30:         { return itsText == rhs.GetText(); }
    31:
    32:        operator long() { return itsNoteNumber; }
    33:
    34:        void Display() const
    35:           { cout << "Note #: " << itsNoteNumber;
    36:           cout <<   "  Text: " << itsText << endl; }
    37:
    38:        static long theNoteNumber;
    39:     private:
    40:        String itsText;
    41:        long itsDate;
    42:        long itsNoteNumber;
    43:     };
    44:
    45:    #endif

Output:

None

Analysis:

This rNote object will be used in subsequent chapters as the basis for exploring complex data-manipulation objects. Store this listing in the file note.hpp.

Listing 7.4 String.hpp

    1:     // **************************************************
    2:     // PROGRAM:  String declaration
    3:     // FILE:     string.hpp
    4:     // PURPOSE:  provide fundamental string functionality
    5:     // NOTES:
    6:     // AUTHOR:   Jesse Liberty (jl)
    7:     // REVISIONS: 10/23/94 1.0 jl  initial release
    8:     // **************************************************
    9:
    10:    #ifndef STRING_HPP
    11:    #define STRING_HPP
    12:    #include <string.h>
    13:    #include "stdef.hpp"
    14:
    15:
    16:    class xOutOfBounds {};
    17:
    18:    class String
    19:    {
    20:    public:
    21:
    22:             // constructors
    23:             String();
    24:             String(const char *);
    25:             String (const char *, int length);
    26:             String (const String&);
    27:             ~String();
    28:
    29:             // helpers and manipulators
    30:             int   GetLength() const { return itsLen; }
    31:             BOOL IsEmpty() const { return (BOOL) (itsLen == 0); }
    32:             void Clear();                // set string to 0 length
    33:
    34:             // accessors
    35:             char operator[](int offset) const;
    36:             char& operator[](int offset);
    37:             const char * GetString()const  { return itsCString; }
    38:
    39:             // casting operators
    40:              operator const char* () const { return itsCString; }
    41:              operator char* () { return itsCString;}
    42:
    43:             // operators
    44:             const String& operator=(const String&);
    45:             const String& operator=(const char *);
    46:
    47:             void operator+=(const String&);
    48:             void operator+=(char);
    49:             void operator+=(const char*);
    50:
    51:             int operator<(const String& rhs)const;
    52:             int operator>(const String& rhs)const;
    53:             BOOL operator<=(const String& rhs)const;
    54:             BOOL operator>=(const String& rhs)const;
    55:             BOOL operator==(const String& rhs)const;
    56:             BOOL operator!=(const String& rhs)const;
    57:
    58:
    59:             // friend functions
    60:             String operator+(const String&);
    61:             String operator+(const char*);
    62:             String operator+(char);
    63:
    64:             void Display()const { cout << *this << " "; }
    65:             friend ostream& operator<< (ostream&, const String&);
    66:
    67:    private:
    68:             // returns 0 if same, -1 if this is less than argument,
    69:             // 1 if this is greater than argument
    70:             int StringCompare(const String&) const;  // used by Boolean
                    operators
    71:
    72:             char * itsCString;
    73:             int itsLen;
    74:    };
    75:
    76:    #endif // end inclusion guard

Listing 7.5 String.cpp

    1:     // **************************************************
    2:     // PROGRAM:  String definition
    3:     // FILE:     string.cpp
    4:     // PURPOSE:  provide fundamental string functionality
    5:     // NOTES:
    6:     // AUTHOR:   Jesse Liberty (jl)
    7:     // REVISIONS: 10/23/94 1.0 jl  initial release
    8:     // **************************************************
    9:
    10:
    11:    #include "string.hpp"
    12:
    13:    // default constructor creates string of 0 bytes
    14:    String::String()
    15:    {
    16:       itsCString = new char[1];
    17:       itsCString[0] = '\0';
    18:       itsLen=0;
    19:    }
    20:
    21:    String::String(const char *rhs)
    22:    {
    23:       itsLen = strlen(rhs);
    24:       itsCString = new char[itsLen+1];
    25:       strcpy(itsCString,rhs);
    26:    }
    27:
    28:    String::String (const char *rhs, int length)
    29:    {
    30:       itsLen = strlen(rhs);
    31:       if (length < itsLen)
    32:          itsLen = length;  // max size = length
    33:       itsCString = new char[itsLen+1];
    34:       memcpy(itsCString,rhs,itsLen);
    35:       itsCString[itsLen] = '\0';
    36:    }
    37:
    38:    // copy constructor
    39:    String::String (const String & rhs)
    40:    {
    41:       itsLen=rhs.GetLength();
    42:       itsCString = new char[itsLen+1];
    43:       memcpy(itsCString,rhs.GetString(),itsLen);
    44:       itsCString[rhs.itsLen]='\0';
    45:    }
    46:
    47:    String::~String ()
    48:    {
    49:       Clear();
    50:    }
    51:
    52:    void String::Clear()
    53:    {
    54:       delete [] itsCString;
    55:       itsLen = 0;
    56:    }
    57:
    58:    //non constant offset operator
    59:    char & String::operator[](int offset)
    60:    {
    61:       if (offset > itsLen)
    62:       {
    63:          throw xOutOfBounds();
    64:          return itsCString[itsLen-1];
    65:       }
    66:       else
    67:          return itsCString[offset];
    68:    }
    69:
    70:    // constant offset operator
    71:    char String::operator[](int offset) const
    72:    {
    73:       if (offset > itsLen)
    74:       {
    75:          throw xOutOfBounds();
    76:          return itsCString[itsLen-1];
    77:       }
    78:       else
    79:          return itsCString[offset];
    80:    }
    81:
    82:    // operator equals
    83:    const String& String::operator=(const String & rhs)
    84:    {
    85:       if (this == &rhs)
    86:          return *this;
    87:       delete [] itsCString;
    88:       itsLen=rhs.GetLength();
    89:       itsCString = new char[itsLen+1];
    90:       memcpy(itsCString,rhs.GetString(),itsLen);
    91:       itsCString[rhs.itsLen]='\0';
    92:       return *this;
    93:    }
    94:
    95:    const String& String::operator=(const char * rhs)
    96:    {
    97:       delete [] itsCString;
    98:       itsLen=strlen(rhs);
    99:       itsCString = new char[itsLen+1];
    100:      memcpy(itsCString,rhs,itsLen);
    101:      itsCString[itsLen]='\0';
    102:      return *this;
    103:   }
    104:
    105:
    106:   // changes current string, returns nothing
    107:   void String::operator+=(const String& rhs)
    108:   {
    109:      unsigned short rhsLen = rhs.GetLength();
    110:      unsigned short totalLen = itsLen + rhsLen;
    111:      char *temp = new char[totalLen+1];
    112:      for (int i = 0; i<itsLen; i++)
    113:         temp[i] = itsCString[i];
    114:      for (int j = 0; j<rhsLen; j++, i++)
    115:         temp[i] = rhs[j];
    116:      temp[totalLen]='\0';
    117:      *this = temp;
    118:   }
    119:
    120:   int String::StringCompare(const String& rhs) const
    121:   {
    122:         return strcmp(itsCString, rhs.GetString());
    123:   }
    124:
    125:   String String::operator+(const String& rhs)
    126:   {
    127:
    128:      char * newCString = new char[GetLength() + rhs.GetLength() + 1];
    129:      strcpy(newCString,GetString());
    130:      strcat(newCString,rhs.GetString());
    131:      String newString(newCString);
    132:      return newString;
    133:   }
    134:
    135:
    136:   String String::operator+(const char* rhs)
    137:   {
    138:
    139:      char * newCString = new char[GetLength() + strlen(rhs)+ 1];
    140:      strcpy(newCString,GetString());
    141:      strcat(newCString,rhs);
    142:      String newString(newCString);
    143:      return newString;
    144:   }
    145:
    146:
    147:   String String::operator+(char rhs)
    148:   {
    149:      int oldLen = GetLength();
    150:      char * newCString = new char[oldLen + 2];
    151:      strcpy(newCString,GetString());
    152:      newCString[oldLen] = rhs;
    153:      newCString[oldLen+1] = '\0';
    154:      String newString(newCString);
    155:      return newString;
    156:   }
    157:
    158:
    159:
    160:   BOOL String::operator==(const String& rhs) const
    161:   { return (BOOL) (StringCompare(rhs) == 0); }
    162:   BOOL String::operator!=(const String& rhs)const
    163:      { return (BOOL) (StringCompare(rhs) != 0); }
    164:   int String::operator<(const String& rhs)const
    165:      { return (BOOL) (StringCompare(rhs) < 0); }
    166:   int String::operator>(const String& rhs)const
    167:      { return (BOOL) (StringCompare(rhs) > 0); }
    168:   BOOL String::operator<=(const String& rhs)const
    169:      { return (BOOL) (StringCompare(rhs) <= 0); }
    170:   BOOL String::operator>=(const String& rhs)const
    171:      { return (BOOL) (StringCompare(rhs) >= 0); }
    172:
    173:   ostream& operator<< (ostream& ostr, const String& str)
    174:   {
    175:      ostr << str.itsCString;
    176:      return ostr;
    177:   }

Output:

None.

Analysis:

The string implementation provided here is the same as seen previously, except that it has been changed to work with the other included files. Save this as string.cpp.

String.hpp, string.cpp, note.hpp, linklist.hpp, and stdef.hpp now provide a set of functionalities with which you can explore the complex data structures seen in the rest of this chapter. Save these all in one directory, because you will be using them and extending them over the coming days.

Testing the Building Blocks

Before moving on, examine listing 7.6, which provides a small test program that exercises the listings provided earlier in this chapter.

Listing 7.6 Testing the Building Blocks

    1:     // **************************************************
    2:     // PROGRAM:  Test driver
    3:     // FILE:     0706.cpp
    4:     // PURPOSE:  Test String, Note and Linked List
    5:     // NOTES:    Same functionality as 0307.cpps
    6:     // AUTHOR:   Jesse Liberty (jl)
    7:     // REVISIONS: 10/23/94 1.0 jl  initial release
    8:     // **************************************************
    9:
    10:      #include "String.hpp"  // string class
    11:      #include "Note.hpp"    // notes
    12:      #include "LinkList.hpp" // linked lists
    13:      #include "Note.hpp"
    14:      #include "stdef.hpp"  // standard definitions
    15:
    16:    long rNote::theNoteNumber = 0;
    17:    void main()
    18:     {
    19:        List<rNote> pl;
    20:        rNote * pNote = 0;
    21:        long choice;
    22:        char buffer[256];
    23:
    24:        while (1)
    25:        {
    26:           cout << "\n(0)Quit (1)Add Note ";
    27:           cin >> choice;
    28:           if (!choice)
    29:              break;
    30:
    31:           cin.ignore(255,'\n');
    32:           cout << "\nText: ";
    33:           cin.getline(buffer,255);
    34:           pNote = new rNote(buffer);
    35:           pl.Insert(*pNote);
    36:        }
    37:
    38:        cout << "\n\nResults: \n" << endl;
    39:        void (rNote::*pFunc)()const = rNote::Display;
    40:        pl.Iterate(pFunc);
    41:
    42:        cin.ignore(255,'\n');
    43:        cout << "\nFind Note with text: ";
    44:        cin.getline(buffer,255);
    45:        rNote target(buffer);
    46:        pNote=0;
    47:        pNote =  pl.FindObject(target);
    48:        if (pNote)
    49:        {
    50:         cout << "\nFound! " << endl;
    51:         pNote->Display();
    52:        }
    53:        else
    54:           cout << "Note not found." << endl;
    55:    }

Output:

    (0)Quit (1)Add Note 1
    Text: Walk the walk
    (0)Quit (1)Add Note 1
    Text: Talk the talk
    (0)Quit (1)Add Note 0

    Results:

    Note #: 0  Text: Walk the walk
    Note #: 1  Text: Talk the talk

    Find Note with text: Talk the talk
    Found!
    Note #: 1  Text: Talk the talk

Analysis:

Listing 7.6 reproduces the functionality of listing 3.7, "Using a Parameterized List Class," using the new .hpp files shown in the previous listings. This confirms that these listings are correct and will compile and link. Although it took a few pages to get here, you now know that you have solid, reusable code.

Using Hash Tables

A hash table is a convenient way to manage large amounts of data using a small number of storage locations. You can imagine a series of "buckets" with data in each. Each bucket is numbered, and all the data you want to store must go in one or another of the buckets. You create a hashing algorithm to take all your data and reduce it to one of the available bucket values.

A simple (and simplistic!) approach to storing terms for ROBIN might be to hash the data based on the first letter of each word. Words beginning with A would be in bucket 0, words beginning with B would be in bucket 1, and so on. This method would enable you to create an array of 26 words, and each word would have its specific and easily identified location.

The problem with this approach, of course, is that more than one word might evaluate to the same bucket. Both boy and bottle begin with b, for example. You could disallow this condition, but then ROBIN would become pretty unusable, pretty quickly.

A modest alternative would be for each array location to hold a linked list of words. Thus, when two words evaluate to the same bucket (such as boy and bottle), they would each be put in the same list. Listing 7.7 illustrates this approach.

New term: A bucket is a numbered location for your data.

New term: A hashing algorithm is a formula applied to the data that renders a hash bucket number.

Listing 7.7 Hashing the Words in ROBIN

    1:     // **************************************************
    2:     // PROGRAM:  Hash Table
    3:     // FILE:     0707.cpp
    4:     // PURPOSE:  Provide simple hash table for words
    5:     // NOTES:
    6:     // AUTHOR:   Jesse Liberty (jl)
    7:     // REVISIONS: 10/23/94 1.0 jl  initial release
    8:     // **************************************************
    9:
    10:    #include "String.hpp"  // string class
    11:    #include "LinkList.hpp" // linked lists
    12:    #include "stdef.hpp"  // standard definitions
    13:    #include <ctype.h>    //  toupper()
    14:
    15:    void main(int argc, char ** argv)
    16:    {
    17:        List<String> myArray[26];
    18:
    19:        for (int i=1; i<argc; i++)
    20:        {
    21:          int offset = toupper(argv[i][0]) - 'A';
    22:          String* ps = new String(argv[i]);
    23:          myArray[offset].Insert(*ps);
    24:        }
    25:
    26:        for (i=0; i<26; i++)
    27:          myArray[i].Iterate(String::Display);
    28:
    29:    }

Output:

    d:\>0707 Eternal Vigilance Is The Price Of Liberty
    Eternal Is Liberty Of Price The Vigilance

Analysis:

This small program develops a very complicated and interesting hash table. In line 17, myArray is declared to be an array of 26 String lists. In lines 19 through 24, each of the arguments to the program is analyzed, looking for the first letter. The letters are forced to uppercase and then the value of 'A' is subtracted, leaving a number between 0 and 25. That hashed value is the offset into the array of lists of strings at which this word will be kept.

A string is created in line 22 from the argument at argv[i], and that string is passed by reference into the array of lists, calling the Insert function.

In line 26, the entire array is canvassed, with Iterate() called on each list. The address of the String member function Display is passed into Iterate(), and each String of each member of each list is called.

Deciding When To Use Hash Tables

In truth, you almost never would use a hash table as was done here. The hallmark usage of a hash table is when you have a small number of items, which you need to access wicked fast. Typically, you would hope to have more buckets than items, keeping data collisions down to a very small number.

You still would implement the linked list to handle the occasional collision, but almost all these lists would have at most one item in them.

Using Trees and Other Powerful Data Structures

For a first version of ROBIN, the hash algorithm shown in this chapter might be sufficient. It is a fundamental design principle to get something working, and then to worry about optimizing it. Who knows? You might find that accessing the words is not the slow part of your program.

It is true, however, that there are more efficient ways to get at a great deal of data, and that when you have tens of thousands of words you can expect this approach to bog down. After all, accessing the word Thunder means finding the bucket of T words, and then walking the linked list until you find Thunder. There must be a better way, and there is.

Using Trees

A tree is a linked list in which each node points to two or more next nodes. Figure 7.1 illustrates a basic tree structure.

Figure 7.1. Basic tree structure.

The topmost node in a tree is called the root node, and nodes that don't point to any other nodes are called leaf nodes.

New term: The root node is the top of a tree.

New term: A leaf node is a node that points to no other nodes.

Note that in a tree, nodes point only to other nodes farther "down" the tree. The nodes they point to are called child nodes, and thus the pointing node is called the parent node. In figure 7.1, the root node is parent to two children, each of which is a parent to two more children.

Using Binary Trees

The simplest form of tree is a binary tree, in which each node has no more than two child nodes. In a binary tree, each node is associated with a value. The left child node has values less than the current value, and the right child has values greater than the current value. Figure 7.2 illustrates this idea.

Figure 7.2. A simple binary tree.

Note that each node either points to more nodes, or points to a null pointer.

Finding a Node

You search for a node by beginning at the root, and then moving down to the left for lesser values, and down to the right for greater values. In figure 7.3, the word client is searched for. The root is Computer, and client comes earlier in the alphabet, so the node to the left is examined. The word client comes later than the word cat, so the node to the right is searched, and the target word, client, is found.

Figure 7.3. Searching for client.

In figure 7.4, the word drugs is searched for. Again, you start at the root, computer. Because drugs comes later in the alphabet, the node to the right is examined. Drugs is larger than dog, so again the node to the right is examined, obtaining eat. Because drugs is less than eat, the node to the left would be examined, but it is null. The word is not found.

Figure 7.4. Searching for drugs.

Inserting Nodes

If you wanted to insert a word into the list, you would search for it, and add it when you hit the first null. In the example in figure 7.4, drugs would be added as the left node of eat. Adding the word drag would work as shown in figure 7.5.

Figure 7.5. Adding drag.

You start your search for drag at computer, go right to dog, go right to eat, go left to drugs, and then go left and get back null.

Deleting a Node

When you delete a node from the tree, it has child nodes or it is a leaf. Deleting a leaf is simple; deleting a node with child nodes, however, is a bit more complex.

Deleting a Leaf

To delete a leaf node from the tree, just tell its parent node not to point to it anymore, setting the parent node's pointer to null. Then delete the node, returning its memory to the free store, and removing whatever data it holds. This process is illustrated in figure 7.6, in which the leaf node boy is removed. The parent, Cat, now has its left pointer point to null; and the node, boy, is deleted.

Figure 7.6. Deleting boy--a leaf node.

Deleting a Node with One Child

To delete a node with one child node, have the parent point to the child and then delete the node. Figure 7.7 illustrates this process by deleting drugs. To do this, the parent, eat, now points to drag, and drugs is deleted.

Figure 7.7. Deleting drugs--a node with one child.

Deleting a Node with Two Children

Deleting a node with two children is somewhat more complicated. Examine figure 7.8, which has a different set of nodes; these are assembled from the words four score and seven years ago our fore fathers brought forth on this continent a new nation.

Figure 7.8. A different set of nodes.

Notice that the word and has two child nodes. It is imperative that when and is removed, the tree remains in the right order, with each left child node having a lower value than the current node, and each right child node having a greater value. Here's how it is done:

Find the node with the next higher value (the successor). In this case, the successor is brought. Note that the successor will never have a left node, because that then would be the successor to the target.
Put the successor into the current position, maintaining the original links. Therefore, brought now points to ago and fore.
Promote the right child of the successor if there is one. In this case, brought points to continent, which becomes the right child of ago.

Deleting and in figure 7.8 leaves the binary tree, as shown in figure 7.9.

Creating an rWord Object

The binary tree will sort rWord objects in alphabetical order based on their strings. The rWord object shown in listing 7.8 is only a first approximation of the final rWord object, it will be used in ROBIN and is used here only for demonstration purposes.

Figure 7.9. Deleting and.

Listing 7.8 Using an rWord Declaration

    1:     // **************************************************
    2:     // PROGRAM:  Basic word object
    3:     // FILE:     word.hpp
    4:     // PURPOSE:  provide simple word object
    5:     // NOTES:
    6:     // AUTHOR:   Jesse Liberty (jl)
    7:     // REVISIONS: 10/23/94 1.0 jl  initial release
    8:     // **************************************************
    9:
    10:    #ifndef WORD_HPP
    11:    #define WORD_HPP
    12:
    13:    #include "stdef.hpp"
    14:    #include "string.hpp"
    15:
    16:    class rWord
    17:    {
    18:     public:
    19:        rWord(const String& text):
    20:        itsText(text), reserved1(0L), reserved2(0L) {}
    21:        ~rWord(){}
    22:
    23:        const String& GetText()const { return itsText; }
    24:
    25:        int operator<(const rWord& rhs)
    26:         { return itsText < rhs.GetText(); }
    27:
    28:        int operator>(const rWord& rhs)
    29:         { return itsText > rhs.GetText(); }
    30:
    31:        int operator<=(const rWord& rhs)
    32:         { return itsText <= rhs.GetText(); }
    33:
    34:        int operator=>(const rWord& rhs)
    35:         { return itsText => rhs.GetText(); }
    36:
    31:        BOOL operator==(const rWord& rhs)
    37:         { return itsText == rhs.GetText(); }
    38:
    39:        void Display() const
    40:           { cout <<   "  Text: " << itsText << endl; }
    41:
    42:     private:
    43:        String itsText;
    44:        long reserved1;
    45:        long reserved2;
    46:     };
    47:
    48:    #endif

Output:

None.

Analysis:

The rWord object holds a string and reserves space for two long values. These may hold record numbers or other information later, when rWord is fleshed out. For now, rWord objects serve as string holders that can be placed in the binary tree.

Using the Demonstration Program

The demonstration program will read the command line and create an rWord object for each word in the command line. The rWord then will be added to the tree. When all the command-line words have been read, the tree will be printed. Indentation is used to show the relationship among the words, using a very simple display function provided by most compilers. Listing 7.9 provides the driver program to illustrate the binary tree.

Note: Most compilers support the ANSI function gotoxy(). If your compiler does not, however, you will need to modify this program so that the words are printed to the screen one after another.

Listing 7.9 Using the Driver Program

    1:     // Using the Driver Program
    2:
    3:     #include "word.hpp"
    4:     #include <conio.h>
    5:
    6:     class BinaryNode;     // forward declaration
    7:
    8:     class BinaryTree
    9:     {
    10:    public:
    11:      BinaryTree():myHead(0),myCount(0){}
    12:      ~BinaryTree(){}
    13:      long GetCount() const { return myCount; }
    14:      void Insert (rWord &);
    15:      void Delete (rWord &);
    16:      void Iterate(void (rWord::*f)() const);
    17:      BinaryNode* FindObject(rWord& target);
    18:      void PrintTree(int, char**);
    19:      static long x;
    20:      static long y;
    21:
    22:    private:
    23:       BinaryNode* myHead;
    24:       long myCount;
    25:    };
    26:
    27:    class BinaryNode
    28:    {
    29:    public:
    30:       BinaryNode(rWord* word);
    31:       BinaryNode(rWord& word);
    32:       BinaryNode(const BinaryNode&);
    33:       ~BinaryNode() {}
    34:       rWord* GetWord()const{ return myWord; }
    35:       void SetWord(rWord* target) { myWord = target;}
    36:
    37:       void InsertSmaller(BinaryNode* NewNode);
    38:       void InsertBigger(BinaryNode* NewNode);
    39:
    40:       BinaryNode* GetSmaller() const; // { return mySmaller;  }
    41:       BinaryNode* GetBigger() const; // { return myBigger; }
    42:       BinaryNode* GetParent() const; //{ return myParent; }
    43:
    44:       void SetSmaller(BinaryNode* target)  {mySmaller = target; }
    45:       void SetBigger(BinaryNode* target)  {myBigger = target; }
    46:       void SetParent(BinaryNode* target)  {myParent = target; }
    47:
    48:       int operator<(const BinaryNode &rhs) const
    49:          {return *myWord < *(rhs.GetWord());}
    50:       int operator>(const BinaryNode &rhs) const
    51:          {return *myWord > *(rhs.GetWord());}
    52:       BOOL operator==(const BinaryNode &rhs) const
    53:          { return *myWord ==*(rhs.GetWord());}
    54:       BOOL operator==(const rWord& target) const
    55:          {return *myWord == target.GetText();} //
    56:
    57:    private:
    58:
    59:       BinaryNode * mySmaller;
    60:       BinaryNode * myBigger;
    61:       BinaryNode * myParent;
    62:       rWord * myWord;
    63:    };
    64:
    65:    BinaryNode* BinaryNode::GetSmaller() const
    66:    {
    67:       BinaryTree::y++;
    68:       BinaryTree::x-= 14-BinaryTree::y*2;
    69:       return mySmaller;
    70:    }
    71:
    72:    BinaryNode* BinaryNode::GetBigger() const
    73:    {
    74:       BinaryTree::y++;
    75:       BinaryTree::x+= 14-BinaryTree::y*2;
    76:       return myBigger;
    77:    }
    78:
    79:    BinaryNode* BinaryNode::GetParent() const
    80:    {
    81:       BinaryTree::y--;
    82:       return myParent;
    83:    }
    84:
    85:
    86:    BinaryNode::BinaryNode(rWord* word):
    87:       mySmaller(0),
    88:       myBigger(0),
    89:       myParent(0),
    90:       myWord(word)
    91:    {  }
    92:
    93:    BinaryNode::BinaryNode(rWord &word):
    94:       mySmaller(0),
    95:       myBigger(0),
    96:       myParent(0),
    97:       myWord(&word)
    98:    {  }
    99:
    100:   BinaryNode::BinaryNode(const BinaryNode& rhs)
    101:   {
    102:      mySmaller=rhs.GetSmaller();
    103:      myBigger=rhs.GetBigger();
    104:      myParent=rhs.GetParent();
    105:      myWord = rhs.GetWord();
    106:   }
    107:
    108:
    109:   void BinaryNode::InsertSmaller(BinaryNode* newNode)
    110:   {
    111:        newNode->SetSmaller(mySmaller);
    112:        newNode->SetParent(this);
    113:        mySmaller=newNode;
    114:   }
    115:
    116:
    117:   void BinaryNode::InsertBigger(BinaryNode* newNode)
    118:   {
    119:        newNode->SetBigger(myBigger);
    120:        newNode->SetParent(this);
    121:        myBigger=newNode;
    122:   }
    123:
    124:
    125:   void BinaryTree::Insert(rWord& rhs)
    126:   {
    127:      BinaryNode* newNode = new BinaryNode(&rhs);
    128:
    129:      if (myHead == 0)
    130:      {
    131:         myHead = newNode;
    132:         return;
    133:      }
    134:
    135:         BinaryNode* node = myHead;
    136:         while (node)
    137:         {
    138:            // duplicate? replace
    139:            if (newNode == node)
    140:            {
    141:               newNode->SetSmaller(node->GetSmaller());
    142:               newNode->SetBigger(node->GetBigger());
    143:               newNode->SetParent(node->GetParent());
    144:               if (node == myHead)
    145:                  myHead = newNode;
    146:               else
    147:               {
    148:                  if ( node->GetParent()->GetSmaller() == node)
    149:                     node->GetParent()->SetSmaller(newNode);
    150:                  else
    151:                     node->GetParent()->SetBigger(newNode);
    152:               }
    153:               delete node;
    154:               break;
    155:            }
    156:            if (*newNode < *node)
    157:            {
    158:               if (!node->GetSmaller())
    159:               {
    160:                  node->SetSmaller(newNode);
    161:                  newNode->SetParent(node);
    162:                  break;
    163:               }
    164:               else
    165:                  node = node->GetSmaller();
    166:            }
    167:            else
    168:            {
    169:               if (!node->GetBigger())
    170:               {
    171:                  node->SetBigger(newNode);
    172:                  newNode->SetParent(node);
    173:                  break;
    174:               }
    175:               else
    176:                  node = node->GetBigger();
    177:            }
    178:         }   // end while
    179:   }          // end function
    180:
    181:
    182:   BinaryNode* BinaryTree::FindObject(rWord& rhs)
    183:   {
    184:      BinaryNode* node = myHead;
    185:      x = 40;
    186:      y = 1;
    187:
    188:      while (node)
    189:      {
    190:         if (*node == rhs)
    191:            break;
    192:
    193:         if (*node < rhs)
    194:            node = node->GetBigger();
    195:         else
    196:            node = node->GetSmaller();
    197:      }
    198:      return node;
    199:   }
    200:
    201:   void BinaryTree::Delete (rWord & target)
    202:   {
    203:      BinaryNode* node = FindObject(target);
    204:
    205:      if (node)
    206:      {
    207:         if (!node->GetBigger())
    208:         {
    209:            if (!node->GetSmaller())
    210:            {
    211:               if (node == myHead)
    212:                  myHead = 0;
    213:               else
    214:               {
    215:                  if (node->GetParent()->GetSmaller() == node)
    216:                     node->GetParent()->SetSmaller(0);
    217:                  else
    218:                     node->GetParent()->SetBigger(0);
    219:               }
    220:            }
    221:            else // has a smaller
    222:            {
    223:               if (node == myHead)
    224:                  myHead = node->GetSmaller();
    225:               else
    226:               {
    227:                  if (node->GetParent()->GetSmaller() == node)
    228:                     node->GetParent()->SetSmaller(node->GetSmaller());
    229:                  else
    230:                     node->GetParent()->SetBigger(node->GetSmaller());
    231:               }
    232:            }
    233:         }
    234:         else // node does have a bigger
    235:         {
    236:            if (!node->GetSmaller())
    237:            {
    238:               if (node == myHead)
    239:                  myHead = node->GetBigger();
    240:               else
    241:               {
    242:                if (node->GetParent()->GetSmaller() == node)
    243:                  node->GetParent()->SetSmaller(node->GetBigger());
    244:               else
    245:                  node->GetParent()->SetBigger(node->GetBigger());
    246:               }
    247:            }
    248:            else // node has both!!
    249:            {
    250:               BinaryNode * next = node->GetBigger();
    251:
    252:               while (next->GetSmaller())
    253:                  next=next->GetSmaller();
    254:
    255:               if (next->GetParent()->GetSmaller() == next)
    256:                  next->GetParent()->SetSmaller(next->GetBigger());
    257:               else
    258:                  next->GetParent()->SetBigger(next->GetBigger());
    259:
    260:               node->SetWord(next->GetWord());
    261:
    262:               if (next->GetBigger())
    263:                  next->GetBigger()->SetParent(node);
    264:
    265:               node = next;
    266:            }
    267:         } // end else does have bigger
    268:      } // end if node
    269:      delete node;
    270:   } // end function
    271:
    272:   void BinaryTree::PrintTree(int argc, char **argv)
    273:   {
    274:      for (int i = 1; i<argc; i++)
    275:      {
    276:         rWord* word = new rWord(argv[i]);
    277:         BinaryNode* pbn = FindObject(*word);
    278:         delete word;
    279:         // BinaryTree::x*=3; // 3 spaces per unit
    280:         BinaryTree::x = BinaryTree::x < 0 ? 0 : BinaryTree::x;
    281:         BinaryTree::x = BinaryTree::x > 80 ? 80 : BinaryTree::x;
    282:         BinaryTree::y = BinaryTree::y < 0 ? 0 : BinaryTree::y;
    283:         BinaryTree::y = BinaryTree::y > 25 ? 25 :BinaryTree::y;
    284:
    285:         gotoxy(BinaryTree::x, BinaryTree::y+10);
    286:         cout << pbn->GetWord()->GetText();
    287:      }
    288:   }
    289:
    290:   long BinaryTree::x
    291:   long BinaryTree::y;
    292:   int main(int argc, char **argv)
    293:   {
    294:      BinaryTree tree;
    295:      for (int i = 1; i< argc; i++)
    296:      {
    297:         rWord* word = new rWord(argv[i]);
    298:         tree.Insert(*word);
    299:      }
    300:
    301:      tree.PrintTree(argc,argv);
    302:      cout << "\n\n\n" << endl;
    303:
    304:      return 0;
    305:   }

Output:

    d:\>0708 now is the time for all good men to come to the aid



                                           now
                                 is                  the
                         for             men                 time
                   all         good                                to
               aid     come

Analysis:

In line 3, the new word.hpp file is included, along with conio.h in line 4, which is the header file that Borland C++ 4.0 requires for the gotoxy() function. In line 6, the BinaryNode class is forward declared because BinaryTree will make reference to BinaryNode pointers, and the compiler must know that a BinaryNode is a type.

Note that in a real program, these declarations (of both BinaryNode and BinaryTree) would be moved off to a header file. The BinaryTree declares a constructor and destructor and a GetCount() method that returns the number of nodes in the tree (and that is not used in the demonstration program).

The Insert() and Delete() member functions will work as described earlier. The Iterate() function is a carryover from the linked List class, and again is not used in the demonstration program.

In line 18, PrintTree() is declared, and below it two public, static variables, x and y. These are used for demonstration purposes only, in order to print out a graph of the tree. This mechanism is rather crude and is included only to prove that the binary tree is working.

In lines 23 and 24, the private member variables are declared. Note that binary tree (unlike linked list) includes a pointer to the first link in the tree.

In lines 27 through 63, the BinaryNode class is declared. As described earlier, each node has four pointers, shown in lines 59 through 62. The node points to each of its two children (mySmaller and mybigger) as well as its parent and the rword object for which it is the node.

In addition to the usual constructors (lines 30 through 32), a variety of accessor functions is provided (lines 40 through 46) and comparison operators (lines 48 through 55). Note that the comparisons are immediately passed on to the rWord, which in turn does a comparison of its strings. Thus, comparing two nodes is the same as comparing the strings of their rWord objects.

The implementations of the accessors GetSmaller() and GetBigger() are made somewhat more complex because they keep track of the relative x and y coordinates of the node. That is, each time you go down a level, the static member BinaryTree::y is incremented, and each time you go right or left, the static member BinaryTree::x is incremented or decremented. Note that a simple but effective algorithm is used to decide how much to move right or left. As you go farther down the tree, you want to shift fewer spaces right or left before writing the word. This still does not prevent all overlaps, so it is possible for two words to print to the same space, overwriting each other. Keep this in mind as you experiment with different command-line arguments.

BinaryNode::InsertSmaller() in lines 109 through 114 and BinaryNode::InsertBigger() in lines 117 through 122 are fairly straightforward. When a node is inserted into the smaller position, it is told to set its smaller pointer to the current node's smaller pointer, and to set its parent node pointer to the current node. Then the current node sets its own smaller pointer to the new node.

The more interesting method, BinaryTree::Insert(), is shown in lines 125 through 179. Here, a word object is provided to the tree. The first thing the tree does is to create a new node for the word, in line 127. It then checks to see whether the root of the tree, myHead, already is occupied. If not, the new node is made the head and the insertion is complete.

Assuming that the head already exists, the trick now is to find the right position for the new node. A node pointer, node, is declared in line 135, and set to point to the head.

Each time through the while loop, the new node is compared to node. In the first case, this looks for a duplicate of the head node. Assuming that it fails the test in line 139, the new node is compared to the node to see whether it is larger or smaller. Assuming that it is smaller, it will pass the test in line 156; otherwise, it will fall through to the else statement in line 167.

If the new node is smaller, the next question is whether the current node already has a smaller pointer. If so, that becomes the current node, and the while loop starts again. If the current node does not have a smaller node, however, the new node is inserted as the current node's smaller node.

If the new node is a bigger node than the current node, the same test is applied: Does the current node have a bigger node? If not, the new node becomes the current node's bigger node; if there is already a bigger node, it becomes the current node and the while loop continues.

FindObject() also takes a word object. It initializes the x and y coordinates for display and then searches for the node in the tree.

Delete() follows the logic described earlier. Take the time to compare the code with the description and make sure that you follow how this works.

PrintTree(), shown in lines 272 through 277, is a quick hack to display the words in a simple graphical manner. It iterates through the command line and asks the binary tree to find each word in turn. The act of finding the word sets the x and y coordinates, which then are checked to make sure they are in range. Then gotoxy() is called, which moves the cursor to the appropriate screen coordinate, where the word is written.

In lines 290 and 291, the static member variables BinaryNode::x and BinaryNode::y are defined. Remember that declaring these in the class does not set aside memory; you must define them at global scope.

Rewriting the Binary Tree

The version of the binary tree shown in listing 7.9 works quite well but brings with it some unnecessary complexity. The printing algorithm is inefficient and not guaranteed to produce good results, and the insertion method is complex and potentially confusing.

The next example rewrites the binary tree to take advantage of the inherent recursion in trees: each node can be conceived, in some sense, as the top of its own tree. This makes insertion and deletion simpler.

It is important to be able to walk the tree when you need to print or search through the tree, although the particular order in which you walk the tree depends on what you are trying to accomplish.

The next program walks the tree in the following order: looking at the smaller value, looking at the current value, and then looking at the larger value. Recursing into this routine enables you to print the entire tree sideways. I'll discuss this in detail on day 8, but for now it enables the tree to show its contents quickly and easily. Listing 7.10 demonstrates these approaches.

Listing 7.10 Another Way to Write a Binary Tree

    1:     // **************************************************
    2:     // PROGRAM:  Binary tree - Second version
    3:     // FILE:     bintree.hpp
    4:     // PURPOSE:  btree with recursive insertion
    5:     // NOTES:    Special thanks to Stephen Zagieboylo
    6:     // AUTHOR:   Jesse Liberty (jl)
    7:     // REVISIONS: 11/3/94 1.0 jl  initial release
    8:     // **************************************************
    9:
    10:    #include "stdef.hpp"
    11:    #include "string.hpp"
    12:    #include "word.hpp"
    13:
    14:    class BinaryNode;   // Forward reference
    15:
    16:    class BinaryTree
    17:    {
    18:    public:
    19:       BinaryTree():myHead(0),myCount(0){}
    20:       ~BinaryTree(){}
    21:       long GetCount() const { return myCount; }
    22:
    23:       void Insert (const rWord& );
    24:       BOOL Delete (const rWord& );      // Returns TRUE if found and deleted.
    25:       void Iterate(void (*f)(const rWord&, int depth));
    26:
    27:    private:
    28:       BinaryNode * myHead;
    29:       long myCount;
    30:    };
    31:
    32:    class BinaryNode
    33:    {
    34:    public:
    35:       BinaryNode(const rWord &);
    36:       ~BinaryNode() {}
    37:       const rWord & GetValue()  { return myValue; }
    38:       void SetValue(const rWord& val) { myValue = val;}
    39:
    40:       BinaryNode* GetSmaller() const   { return mySmaller;  }
    41:       BinaryNode* GetBigger() const      { return myBigger; }
    42:
    43:       BinaryNode * Insert(const rWord&);
    44:       static void ProcessDuplicateValue(const rWord& newValue, rWord&
              existingValue);
    45:
    46:       // Returns the new top of this subtree.
    47:       // Sets the BOOL if it really deleted anything.
    48:       BinaryNode * Delete(const rWord&, BOOL & DidDelete);
    49:
    50:       void SetSmaller(BinaryNode* target) { mySmaller = target; }
    51:       void SetBigger(BinaryNode* target)  { myBigger = target; }
    52:
    53:       BOOL operator>(const rWord& rhs) const;
    54:       BOOL operator==(const rWord& rhs) const;
    55:
    56:       void Iterate(void (*f)(const rWord&, int depth), int depth);
    57:
    58:    private:
    59:       BinaryNode * mySmaller;
    60:       BinaryNode * myBigger;
    61:       rWord myValue;
    62:    };
    63:
    64:    BinaryNode::BinaryNode(const rWord& word):
    65:       mySmaller(0),
    66:       myBigger(0),
    67:       myValue(word)
    68:    {  }
    69:
    70:
    71:    BinaryNode * BinaryNode::Insert(const rWord& word)
    72:    {
    73:       if (*this == word)
    74:          ProcessDuplicateValue(word, myValue);
    75:       else if (*this > word)
    76:       {
    77:          if (mySmaller != 0)
    78:             mySmaller = mySmaller->Insert(word);
    79:          else
    80:             mySmaller = new BinaryNode(word);
    81:       }
    82:       else
    83:       {
    84:          if (myBigger != 0)
    85:             myBigger = myBigger->Insert(word);
    86:          else
    87:             myBigger = new BinaryNode(word);
    88:       }
    89:
    90:       return this;
    91:    }
    92:
    93:    void BinaryNode::ProcessDuplicateValue(const rWord& word , rWord&
           otherWord )
    94:    {
    95:       cout << otherWord.GetText() << " is a duplicate of ";
    96:       cout << word.GetText() << endl;
    97:    }
    98:
    99:    BinaryNode * BinaryNode::Delete(const rWord& word, BOOL & DidDelete)
    100:   {
    101:      if (*this == word)
    102:      {
    103:         // This is the one to remove.  It might be a leaf,
    104:         // a single parent, or a double parent.
    105:         if (mySmaller == 0)      // leaf or one type of single parent
    106:         {
    107:            // if myBigger == 0, return 0 anyway.
    108:            BinaryNode * retval = myBigger;
    109:            DidDelete = TRUE;
    110:            delete this;   // Dangerous!  Must return immediately.
    111:            return retval;
    112:         }
    113:         else if (myBigger == 0)      // other type of single parent
    114:         {
    115:            BinaryNode * retval = mySmaller;
    116:            DidDelete = TRUE;
    117:            delete this;   // Dangerous!  Must return immediately.
    118:            return retval;
    119:         }
    120:         else         // Double parent
    121:         {
    122:            // Find the Node with the lowest value on
    123:            // my Bigger subtree.  Remove him and put him in my place
    124:            BinaryNode * smallest = myBigger;
    125:            BinaryNode * hisparent = 0;
    126:            while (smallest->GetSmaller() != 0)
    127:            {
    128:               hisparent = smallest;
    129:               smallest = smallest->GetSmaller();
    130:            }
    131:
    132:            // Remove him gracefully and put him in our place.
    133:            // Watch out for the case where he is our child.
    134:            if (hisparent != 0)   // not our immediate child.
    135:            {
    136:               hisparent->SetSmaller(smallest->GetBigger());
    137:               smallest->SetBigger(myBigger);
    138:            }
    139:
    140:            smallest->SetSmaller(mySmaller);
    141:
    142:            DidDelete = TRUE;
    143:            delete this;   // Dangerous!  Must return immediately.
    144:            return smallest;
    145:         }
    146:      }
    147:      else if (*this > word)
    148:      {
    149:         if (mySmaller != 0)
    150:            mySmaller = mySmaller->Delete(word, DidDelete);
    151:         return this;
    152:      }
    153:      else
    154:      {
    155:         if (myBigger != 0)
    156:            myBigger = myBigger->Delete(word, DidDelete);
    157:         return this;
    158:      }
    159:   }
    160:
    161:
    162:   BOOL BinaryNode::operator>(const rWord &rhs) const
    163:   {
    164:      return BOOL(myValue.GetText() > rhs.GetText());
    165:   }
    166:
    167:
    168:   BOOL BinaryNode::operator==(const rWord &rhs) const
    169:   {
    170:      return BOOL(myValue.GetText = rhs.GetText());
    171:   }
    172:
    173:
    174:
    175:   void BinaryNode::Iterate(void (*f)(const rWord&, int depth), int depth)
    176:   {
    177:      if (mySmaller != 0)
    178:         mySmaller->Iterate(f, depth+1);
    179:
    180:      f(myValue, depth);
    181:
    182:      if (myBigger != 0)
    183:         myBigger->Iterate(f, depth+1);
    184:   }
    185:
    186:
    187:   void BinaryTree::Insert(const rWord& word)
    188:   {
    189:      if (myHead == 0)
    190:         myHead = new BinaryNode(word);
    191:      else
    192:         myHead = myHead->Insert(word);
    193:
    194:      myCount++;
    195:   }
    196:
    197:
    198:   BOOL BinaryTree::Delete(const rWord& word)
    199:   {
    200:      BOOL DidDelete = FALSE;
    201:
    202:      if (myHead != 0)
    203:         myHead = myHead->Delete(word, DidDelete);
    204:
    205:      if (DidDelete)
    206:         myCount--;
    207:
    208:      return DidDelete;
    209:   }
    210:
    211:   void BinaryTree::Iterate(void (*f)(const rWord&, int depth))
    212:   {
    213:      if (myHead != 0)
    214:         myHead->Iterate(f, 1);
    215:   }
    216:
    217:
    218:   void Display(const rWord & word, int depth)
    219:   {
    220:      for ( ;depth > 0; depth--)
    221:         cout << "    ";
    222:      cout << word.GetText() << endl;
    223:   }
    224:
    225:
    226:   // Utility function used below.
    227:   void ShowTree( BinaryTree & tree )
    228:   {
    229:        cout << "\n\nResults: \n" << endl;
    230:        cout << "Count is: " << tree.GetCount() << endl;
    231:        tree.Iterate(Display);
    232:        cout << "\n\n" ;
    233:   }

Listing 7.11 Using the Driver Program

    1:     // 7.11 - Using the Driver Program
    2:     // second version
    3:
    4:     #include "btree.hpp"
    5:
    6:        int main()
    7:      {
    8:          char buffer[256];
    9:          BinaryTree tree;
    10:         static String ShowString("show");
    11:
    12:         while (1)
    13:         {
    14:            cout << "New String: ";
    15:            cin.getline(buffer,255);
    16:            rWord* word = new rWord(buffer);
    17:
    18:            if (buffer[0] == '\0')
    19:                break;
    20:
    21:            if (*word == ShowString)
    22:                {
    23:                ShowTree(tree);
    24:                continue;
    25:                }
    26:
    27:            tree.Insert(*word);
    28:         }
    29:
    30:         ShowTree(tree);
    31:
    32:         while (1)
    33:         {
    34:            cout << "Word to Delete: ";
    35:            cin.getline(buffer,255);
    36:            rWord* word = new rWord(buffer);
    37:
    38:            if (buffer[0] == '\0')
    39:                break;
    40:
    41:            if (*word == ShowString)
    42:                {
    43:                ShowTree(tree);
    44:                continue;
    45:                }
    46:
    47:            if (!tree.Delete(*word))
    48:                cout << "Not Found\n\n";
    49:         }
    50:
    51:         tree.Iterate(Display);
    52:      }

Output:

    d:\112\day7>0711
    New String: now
    New String: is
    New String: the
    New String: time
    New String: for
    New String: all
    New String: good
    New String: men
    New String: to
    New String: come
    New String: to
    to is a duplicate of to
    New String: the
    the is a duplicate of the
    New String: aid
    New String: of
    New String: the
    the is a duplicate of the
    New String: party

    Results:

    Count is: 16
                        aid
                    all
                        come
                for
                    good
            is
                men
        now
                of
                    party
            the
                time
                    to

    Word to Delete: good
    Word to Delete: time
    Word to Delete:
                        aid
                    all
                        come
                for
            is
                men
        now
                of
                    party
            the
                to

Analysis:

The BinaryNode and BinaryTree classes declared in listing 7.10 store the same rWord objects as the previous examples and utilize the same stdef inclusion file.

One addition to this tree is that it detects and reports on duplicates, without adding them to the tree. The BinaryNode class assumes much more of the work in this approach.

BinaryTree::Insert() is in line 187. It takes an rWord reference and passes it to its topmost node. The tree's only other job is to increment its count.

The real work is done in BinaryNode::Insert(), which is shown in lines 71 through 91. The word is examined to see whether it is a duplicate of the current node, and processed accordingly if it is. If it is not a duplicate, the word is either smaller or larger than the current node.

If the word is smaller and if there is no pointer to a smaller node, this new node becomes the smaller node. If there is already a smaller node, the new node is passed recursively to the insert method of the smaller node.

If the new node is larger than the current node, the same process occurs: If there is no larger node, the new node becomes the larger; if there already is a larger node, the new node is passed recursively down to the larger.

In this way, the new node is passed down the tree until its appropriate insertion point is found.

Delete works in much the same way. The problem facing the tree's Delete() function is that it must get back two values: the new head pointer (in case the current one is deleted) and a Boolean as to whether the deletion was completed. Deletion can fail if the item is not found in the tree.

This problem is solved by passing a Boolean to the BinaryNode::Delete() function by reference. The node then can set the value of Delete, while returning the new head pointer.

BinaryNode::Delete() is shown in lines 99 through 159. If the target node is the current node, the current node must be examined to see whether it has children. If it has no smaller child, Delete() returns its bigger child after deleting itself.

If it has a smaller child, but no bigger child, it returns the smaller child after deleting itself. The only other alternative is that it has two children. In that case, the logic in lines 120 through 145 is used, in which the "next value" is placed into the position of the removed node.

Summary

Today you saw how to create a hash table, which depends on a hashing algorithm to put values into discrete buckets. You saw how to resolve collisions when two objects hash to the same value, and you learned what the trade-offs are in using a hash table. You also learned two ways to implement a binary tree, and how to print the contents of the tree.

Q&A

Q: When would you use a hash table in the real world?

A: Although it is fairly memory-expensive, a hash table is the fastest way to retrieve data, as long as your table is big enough. You would use this technique only in times when speed is of the essence, typically because the access to the data is needed many times a second.

One example where a hash table often is used is for a software virtual memory system. You would keep in a hash table the records of which blocks of data are paged in. In a virtual memory system, the most important point of optimization is the speed of locating a block that is already in memory. If you have these in a hash table, this will be very fast, and the number of pages that can be in memory at one time is relatively low (compared to the number that might be out on disk).

Q: What are the characteristics of a good hashing algorithm?

A: The hashing algorithm determines the right bucket for a particular key. First, it should be very fast, because the whole point of using a hash table is speed. Second, given a typical set of keys, it should spread them out fairly evenly over the range, so that the buckets are equally full.

Q: Give an example of a good hashing algorithm.

A: For alphanumeric strings, XORing the first byte with the last one shifted left three bits is pretty good. Follow this by ANDing the number with an appropriate bit mask to cut it down to the right number of buckets (which has to be a power of two). This example will give you a number from 0 to 63, which you would use to index to a bucket:

    inline int HashingAlgorithm(const char * s)
         {return (s[0] ^ (s[strlen(s)] << 3)) & 0x3f;}

Q: In a binary tree, how many compares will it take to reach any one node in the tree? What is the average for all nodes as a function of the number of nodes in the tree?

A: The number of compares it will take to reach any node is the depth or level of the tree. This will average less than log2 N, where N is the total number of nodes in the tree.

Q: How much of a risk is it that binary trees will be badly unbalanced?

A: If the data being inserted truly is random, then the binary tree will stay reasonably well balanced. However, real data sometimes already is sorted, depending on the source of the data. If sorted data is inserted into a binary tree, it creates the very worst case, a long chain of only right (or only left) branches. A solution to this problem is provided on day 8.

Workshop

The Workshop provides quiz questions to help you solidify your understanding of the material covered, and exercises to provide you with experience in using what you have learned. Try to answer the quiz and exercise questions before checking the answers in Appendix A, and make sure that you understand the answers before continuing to the next chapter.

Quiz

What is the hashing algorithm?
In a binary tree, what is a leaf node?
In a fully filled-out, four-level deep binary tree, what is the maximum number of nodes? Five levels deep? All values from 1 to 10?
In your fully filled-out binary trees, how many nodes are leaf nodes?
In a binary tree, if you delete a node with two children, what node ends up in the place the deleted node is vacating?

[Click here for Answers]

Exercises

What is wrong with this hashing algorithm?

    inline int HashingAlgorithm(const char * s)
    return (s[0] ^ (s[1] << 3) ^ (s[2] >> 3)) & 0x3f;}

Suppose that you are writing part of a virtual memory system. Your part of the system is a hash table that will keep track of data pages swapped in from disk. The pages are identified by a long, which is their position in the file. The important information about each swapped-in page is its address in memory, which is a void *.
What classes would be involved in a hash table in order to maintain this information?
Suppose that you expect to be able to hold approximately 100 pages in memory at any one time, and so you decide to use a hash with 128 buckets. Write a good hashing algorithm that will distribute typical values evenly across this range. (The important thing here is to identify the expected values and show that your algorithm will end up with the values well distributed across the range.)
Parameterize the binary tree template shown in listing 7.10.
Write a test program that will exercise your binary tree template by making a tree of strings.

Go to: Table of Contents | Next Page