Day 16: Synonyms

ROBIN is now almost a full-fledged application. It is time to turn your attention to making the code more robust and ensuring that it is maintainable. In the next few chapters, I'll discuss profiling programs to enhance performance, using a debugger to track down serious problems, and building tough maintainable code. Today, however, I'll try to prove that ROBIN is extensible, that the architecture allows for growth and enhancements.

There are a number of features that you might want to add to ROBIN, even at this early date. The most pressing, in my opinion, is the capability to designate synonyms, so that when searching for computer, for example, ROBIN will match records with PC and MAC in the title. Today you will learn

Maintainable and Extensible Code

One of the hallmarks of maintainable code is that you can explain how your program works to an intelligent programmer in a matter of hours. If your program is too complicated to explain, no matter how clever it is, it will not be maintainable over the long haul.

The second hallmark of maintainable code, however, is that it can be extended to incorporate new features. This extensibility is crucial. Many programmers spend much of their career building on legacy code programs they didn't write, but that they must maintain. A good program allows for the addition of new features without breaking.

ROBIN was not originally designed to include synonyms, but the architecture is sufficiently nimble and the complexity is sufficiently isolated that adding this feature should not be an overwhelming task. A good programmer should be able to get the fundamental functionality in place in less than one day.

How Synonyms Work

ROBIN's current design works like this: When the user asks to find a term, the B-tree is passed the string for which to search. The B-tree starts at its first page and asks each page for the first index on that page that "begins with" the text passed in.

When the index is found, the page is examined to see whether it is a node or a leaf. If it is a node, the index points to the next page in the chain. After a leaf is found, the index points to a WNJ (Word-Node-Join) record. The WNJ record includes a list of offsets into the data file where the actual record resides.

When you insert a synonym (PC, for example) for a term (Computer, for example) you want the tree to find the same WNJFile record for the synonym (PC) as it does for the term (COMPUTER).

The algorithm for inserting the synonym is straightforward: Find the WNJ record for the original term and then create a new index for the new term pointing to that WNJ record.

Remember that when a new index is added, it originally points to the data in the file but then is adjusted to point to a new WNJ record. In this case, you want to fill the index with the WNJ record index of the original term, and not allow it to be adjusted.

Two significant changes must be made to the code, in addition to adding a function to handle the Add Synonym request. First, there must be logic to distinguish this special case so that the index is handled properly. In addition, there must be logic to ensure that only an exact match is found for the original term; if you are adding a synonym for phone (telephone, for example) you do not want to add that synonym to phonetics!

In both cases, a flag is required: Are you adding a synonym? There are two ways to handle this: I've chosen to pass these flags to every function that needs them, defaulting their value to FALSE (the more common case). An alternative would be to set a static Boolean variable in the tree, and to get this value each time the test needs to be tried.

In retrospect, I suspect the code would be cleaner if this and the related flags (FindOnly, for example) were moved into static variables in BTREE. This is left, as they say, as an exercise for the reader.

Listing 16.1 shows the modified BTree.hpp, including the new flags on the node insertion and find methods, as discussed in the analysis that follows. Listing 16.2 is Btree.cpp, with a modified AddKey() method. Listing 16.3 is Index.cpp, which changed to accommodate the new flags. Listing 16.4 is Page.cpp, where the bulk of the changes are located. Listing 16.5 (Datafile.cpp), listing 16.6 (IdxManager.cpp), listing 16.7 (Iter.cpp), and listing 16.8 (WNJFile.cpp) are included for completeness. Finally, listing 16.9 is the driver program, which enables you to enter synonyms using the syntax R2 -Synonym OldTerm NewTerm.

Note: The listings in this chapter are for illustrative purposes only. Your compiler might issue a warning if you use the listings exactly as shown in this chapter.

Listing 16.1 BTree.hpp

    1:      // **************************************************
    2:      // PROGRAM:  Btree, Page and Index declarations
    3:      // FILE:     btree.hpp
    4:      // PURPOSE:  provide fundamental btree functionality
    5:      // NOTES:
    6:      // AUTHOR:   Jesse Liberty (jl)
    7:      // REVISIONS: 11/1/94 1.0 jl  initial release
    8:      // **************************************************
    9:      #ifndef BTREE_HPP                 // inclusion guards
    10:     #define BTREE_HPP
    11:
    12:     #include <time.h>
    13:     #include <string.h>
    14:     #include <fstream.h>
    15:
    16:     #include "stdef.hpp"
    17:
    18:     const int Order =      4; //31;  // 31 indexes and 1 header
    19:     const int dataLen =    11; // length of a key
    20:     const int MaxPages =   500; // more than we need
    21:     const int SizeItem =   16;  // key + offset
    22:     const int SizePointer = 2; // size of offset
    23:     const int PageSize = (Order+1) * SizeItem;
    24:     const int offInt = 2;
    25:
    26:     // forward declarations
    27:     class Page;
    28:     class Index;
    29:
    30:     // all members are public
    31:     // used only by Iterator
    32:     struct History
    33:     {
    34:        History(int, int,BOOL);
    35:        int PageNo;
    36:        int OffSet;
    37:        BOOL IsNode;
    38:     };
    39:
    40:
    41:     // Iterator class keeps track
    42:     // of current (and next) index
    43:     class Iterator
    44:     {
    45:     public:
    46:        Iterator();
    47:        ~Iterator();
    48:        void RecordNode(int, int);
    49:        void RecordLeaf(int, int);
    50:        History * GetLast();
    51:        History * GetFirst();
    52:        int GetNext(const Index&);
    53:        void Reset();
    54:
    55:     private:
    56:      History* myHistory[MaxPages];
    57:      int myNext;
    58:     };
    59:
    60:
    61:     class DataFile
    62:     {
    63:     public:
    64:        // constructors & destructor
    65:        DataFile();
    66:        ~DataFile(){}
    67:
    68:        // management of files
    69:        void Close();
    70:        void Create();
    71:        void GetRecord(int, char*, int&, time_t&);
    72:
    73:        int Insert(char *);
    74:
    75:     private:
    76:     fstream myFile;
    77:     };
    78:
    79:     class WNJFile
    80:     {
    81:     public:
    82:        // constructors & destructor
    83:        WNJFile();
    84:        ~WNJFile(){}
    85:
    86:        // management of files
    87:        void Close();
    88:        void Create();
    89:        int* Find(int NextWNJ);
    90:        int Insert(int, int);
    91:        int Append(int);
    92:
    93:     private:
    94:     static int myCount;
    95:     fstream myFile;
    96:     union
    97:     {
    98:        int myints[5];
    99:        char  myArray[5*4];
    100:    };
    101:    };
    102:
    103:    // IDXManager - in memory keeps track of what pages are
    104:    // already in memory and swaps to disk as required
    105:    class IDXManager
    106:    {
    107:    public:
    108:       // constructors & destructor
    109:       IDXManager();
    110:       ~IDXManager(){}
    111:
    112:       // management of files
    113:       void Close(int);
    114:       int Create();
    115:
    116:       // Page manipulation
    117:       Page* GetPage(int);
    118:       void SetPage(Page*);
    119:       void Insert(Page * newPage);
    120:       void Save(Page* pg);
    121:       int NewPage(Index&, BOOL);
    122:       int NewPage(Index *array, int offset, BOOL leaf, int count);
    123:
    124:    private:
    125:    Page* myPages[MaxPages];
    126:    fstream myFile;
    127:    int myCount;
    128:    };
    129:
    130:    // the btree itself - has a pointer to first page
    131:    class BTree
    132:    {
    133:    public:
    134:       // constructors and destructor
    135:       BTree();
    136:       ~BTree();
    137:
    138:       // utility functions
    139:       void AddKey(char* data, int offset,BOOL synonym = FALSE);
    140:       void Insert(char* buffer, int, char**);
    141:       void Insert(char*);
    142:       void PrintTree();
    143:       int Find(char* data);
    144:       int FindExact(char* data);
    145:       int GetNext(char* data);
    146:
    147:       // page methods
    148:       Page* GetPage(int page)
    149:          { return theIDXManager.GetPage(page); }
    150:       int NewPage(Index& pIndex, BOOL IsLeaf)
    151:          { return theIDXManager.NewPage(pIndex,FALSE); }
    152:
    153:       static int myAlNum(char ch);
    154:       // public static member!
    155:       static  IDXManager theIDXManager;
    156:       static WNJFile   theWNJFile;
    157:       static DataFile theDataFile;
    158:       static Iterator theIter;
    159:
    160:       static BOOL GetWord(char*, char*, int&);
    161:       static void GetStats();
    162:       static int NodeIndexCtr;
    163:       static int LeafIndexCtr;
    164:       static int NodePageCtr;
    165:       static int LeafPageCtr;
    166:       static int NodeIndexPerPage[Order+1];
    167:       static int LeafIndexPerPage[Order+1];
    168:
    169:    private:
    170:       int myRoot;
    171:    };
    172:
    173:    // index objects point to pages or real data
    174:    class Index
    175:    {
    176:    public:
    177:       // constructors & destructor
    178:       Index();
    179:       Index(char *);
    180:       Index (char*, int);
    181:       Index(Index&);
    182:       ~Index(){}
    183:
    184:       // accessors
    185:       const char * GetData() const  { return myData; }
    186:       void SetData(const Index& rhs)
    187:          { strcpy(myData,rhs.GetData()); }
    188:       void SetData(const char * rhs)
    189:          { strcpy(myData,rhs); }
    190:       int GetPointer()const { return myPointer; }
    191:       void SetPointer (int pg) { myPointer = pg; }
    192:
    193:       // utility functions
    194:       void PrintKey();
    195:       void PrintPage();
    196:       int Insert(Index& ref,BOOL findOnly = FALSE, BOOL findExac = FALSE);
    197:
    198:       int Begins(const Index& rhs) const;
    199:
    200:       // overloaded operators
    201:       int operator==(const Index& rhs);
    202:
    203:       int operator < (const Index& rhs)
    204:        {return strcmp(myData,rhs.GetData())<0;}
    205:
    206:       int operator <= (const Index& rhs)
    207:        {return strcmp(myData,rhs.GetData())<=0;}
    208:
    209:       int operator > (const Index& rhs)
    210:        {return strcmp(myData,rhs.GetData())>0;}
    211:
    212:       int operator >= (const Index& rhs)
    213:        {return strcmp(myData,rhs.GetData())>=0;}
    214:
    215:    public:
    216:       int myPointer;
    217:       char myData[SizeItem - SizePointer];
    218:    };
    219:
    220:
    221:    // pages - consist of header and array of indexes
    222:    class Page
    223:    {
    224:    public:
    225:       // constructors and destructor
    226:       Page();
    227:       Page(char*);
    228:       Page(Index&,BOOL);
    229:       Page(Index*, int, BOOL, int);
    230:       ~Page(){}
    231:
    232:       // insertion and searchoperations
    233:       int Insert(Index&, BOOL findOnly = FALSE, BOOL synonym = FALSE);
    234:       int Find(Index& idx) { return Insert(idx,TRUE); }
    235:       int InsertLeaf(Index&);
    236:       int FindLeaf(Index&,BOOL findOnly, BOOL synonym = FALSE);
    237:       int InsertNode(Index&,BOOL findOnly = FALSE, BOOL synonym = FALSE);
    238:       void Push(Index&,int offset=0, BOOL=TRUE);
    239:
    240:       // accessors
    241:       Index GetFirstIndex() { return myKeys[0]; }
    242:       Index GetIndex(int offset) { return myKeys[offset]; }
    243:       BOOL GetIsLeaf() const { return myVars.IsLeaf; }
    244:       int GetCount() const { return myVars.myCount; }
    245:       void SetCount(int cnt) { myVars.myCount=cnt; }
    246:       time_t GetTime() { return myTime; }
    247:       BOOL GetLocked() { return myVars.IsLocked; }
    248:       void SetLocked (BOOL state) { myVars.IsLocked = state; }
    249:
    250:       // page manipulation
    251:       int GetPageNumber() const { return myVars.myPageNumber; }
    252:       Page* GetPage(int page)
    253:          { return BTree::theIDXManager.GetPage(page); }
    254:       int NewPage(Index& rIndex, BOOL IsLeaf)
    255:          { return BTree::theIDXManager.NewPage(rIndex,FALSE); }
    256:
    257:       int NewPage(Index* arr, int off,BOOL isL, int cnt)
    258:          { return   BTree::theIDXManager.NewPage(arr,off,isL, cnt); }
    259:
    260:       // utility functions
    261:       void Nullify(int offset);
    262:       void Print();
    263:       fstream& Write(fstream&);
    264:       void ReCount();
    265:
    266:       static int GetgPageNumber(){ return gPage; }
    267:       static void SetgPageNumber(int pg) { gPage = pg; }
    268:
    269:    private:
    270:       Index * const myKeys; // will point to myVars.mk
    271:       union
    272:       {
    273:          char myBlock[PageSize];   // a page from disk
    274:          struct
    275:          {
    276:             int myCount;
    277:             BOOL IsLeaf;
    278:             int myPageNumber;
    279:             BOOL IsLocked;
    280:             char filler[8]; // we want 16 bytes of overhead
    281:             char mk[Order*sizeof(Index)];  // array of indexes
    282:          }myVars;
    283:       };
    284:
    285:       // memory only
    286:       static int gPage;
    287:       time_t myTime; // for lifo queue
    288:    };
    289:
    290:    #endif

Listing 16.2 BTree.cpp

    1:      // **************************************************
    2:      // PROGRAM:  Btree
    3:      // FILE:     btree.cpp
    4:      // PURPOSE:  provide fundamental btree functionality
    5:      // NOTES:
    6:      // AUTHOR:   Jesse Liberty (jl)
    7:      // REVISIONS: 11/1/94 1.0 jl  initial release
    8:      // **************************************************
    9:
    10:     #include "btree.hpp"
    11:     #include "stdef.hpp"
    12:     #include <ctype.h>
    13:     #include <stdlib.h>
    14:
    15:     // construct the tree
    16:     // initialize myRoot pointer to nothing
    17:     // either create or read the index file
    18:     BTree::BTree():myRoot(0)
    19:     {
    20:        myRoot = theIDXManager.Create();
    21:        theDataFile.Create();
    22:        theWNJFile.Create();
    23:
    24:     }
    25:
    26:     // write out the index file
    27:     BTree::~BTree()
    28:     {
    29:        theIDXManager.Close(myRoot);
    30:        theDataFile.Close();
    31:        theWNJFile.Close();
    32:     }
    33:
    34:     // find an existing record
    35:     int BTree::Find(char * str)
    36:     {
    37:        Index index(str);
    38:        BTree::theIter.Reset();
    39:        if (!myRoot)
    40:           return 0L;
    41:        else
    42:           return GetPage(myRoot)->Find(index);
    43:     }
    44:
    45:    // find an existing record
    46:     int BTree::FindExact(char * str)
    47:     {
    48:        Index index(str);
    49:        BTree::theIter.Reset();
    50:        if (!myRoot)
    51:           return 0L;
    52:        else
    53:           return GetPage(myRoot)->Insert(index,TRUE, TRUE);
    54:     }
    55:
    56:
    57:     int BTree::GetNext(char * str)
    58:     {
    59:        Index index(str);
    60:        return BTree::theIter.GetNext(index);
    61:     }
    62:
    63:
    64:     void BTree::Insert(char * buffer, int argc, char** argv)
    65:     {
    66:        // get each word,
    67:        int offset = theDataFile.Insert(buffer);
    68:        for (int i = 1; i<argc; i++)
    69:        {
    70:           AddKey(argv[i],offset);
    71:        }
    72:
    73:     }
    74:
    75:     void BTree::Insert(char* buffer)
    76:     {
    77:
    78:        if (strlen(buffer) < 3)
    79:           return;
    80:
    81:         char *buff = buffer;
    82:         char word[PageSize];
    83:         int wordOffset = 0;
    84:         int offset;
    85:
    86:        if (GetWord(buff,word,wordOffset))
    87:        {
    88:           offset = theDataFile.Insert(buffer);
    89:           AddKey(word,offset);
    90:        }
    91:
    92:
    93:        while (GetWord(buff,word,wordOffset))
    94:        {
    95:           AddKey(word,offset);
    96:        }
    97:
    98:     }
    99:
    100:    void BTree::AddKey(char * str, int offset, BOOL synonym )
    101:    {
    102:
    103:       if (strlen(str) < 3)
    104:          return;
    105:
    106:       int retVal =0;
    107:       Index index(str,offset);
    108:       if (!myRoot)
    109:       {
    110:          myRoot = theIDXManager.NewPage (index,TRUE);
    111:       }
    112:       else
    113:       {
    114:          retVal = GetPage(myRoot)->Insert(index,FALSE, synonym);
    115:          if (retVal) // our root split
    116:          {
    117:             // create a pointer to the old top
    118:             Index index(GetPage(myRoot)->GetFirstIndex());
    119:             index.SetPointer(myRoot);
    120:             // make the new page & insert the index
    121:             int PageNumber = NewPage(index,FALSE);
    122:
    123:             Page* pg = GetPage(PageNumber);
    124:
    125:             //get a pointer to the new (sibling) page
    126:             Index Sib(GetPage(retVal)->GetFirstIndex());
    127:             Sib.SetPointer(retVal);
    128:             // put it into the page
    129:             pg->InsertLeaf(Sib);
    130:
    131:             // reset myRoot to point to the new top
    132:             myRoot = PageNumber;
    133:          }
    134:       }
    135:    }
    136:
    137:    void BTree::PrintTree()
    138:    {
    139:       GetPage(myRoot)->Print();
    140:
    141:       cout << "\n\nStats:" << endl;
    142:       cout << "Node pages:   " << NodePageCtr << endl;
    143:       cout << "Leaf pages:   " << LeafPageCtr << endl;
    144:       cout << "Node indexes: " << NodeIndexCtr << endl;
    145:       cout << "Leaf indexes: " << LeafIndexCtr << endl;
    146:       for (int i = 0; i < Order +2; i++)
    147:       {
    148:          if (NodeIndexPerPage[i])
    149:          {
    150:             cout << "Pages with " << i << " nodes:  ";
    151:             cout << NodeIndexPerPage[i] << endl;
    152:          }
    153:          if (LeafIndexPerPage[i])
    154:          {
    155:             cout << "Pages with " << i << " leaves: ";
    156:             cout << LeafIndexPerPage[i] << endl;
    157:          }
    158:       }
    159:
    160:    }
    161:
    162:    BOOL BTree::GetWord(char* string, char* word, int& wordOffset)
    163:    {
    164:
    165:       if (!string[wordOffset])
    166:          return FALSE;
    167:
    168:       char *p1, *p2;
    169:       p1 = p2 = string+wordOffset;
    170:
    171:       // eat leading spaces
    172:       for (int i = 0; i<(int)strlen(p1) && !BTree::myAlNum(p1[0]); i++)
    173:          p1++;
    174:
    175:       // see if you have a word
    176:       if (!BTree::myAlNum(p1[0]))
    177:          return FALSE;
    178:
    179:        p2 = p1; // point to start of word
    180:
    181:       // march p2 to end of word
    182:       while (BTree::myAlNum(p2[0]))
    183:          p2++;
    184:
    185:       int len = int(p2-p1);
    186: #if defined(_MSVC_16BIT) || defined(_MSVC_32BIT)
       : {
       :    len = __min(len,(int)PageSize);
       : }
       : #else
       : {
       :    len = min(len,(int)PageSize);
       : }
       : #endif
    187:
    188:       strncpy (word,p1,len);
    189:       word[len]='\0';
    190:
    191:       for (i = int (p2-string); i<(int)strlen(string) && !BTree::myAlNum(p2[0]); i++)
    192:          p2++;
    193:
    194:       wordOffset = int (p2-string);
    195:
    196:       return TRUE;
    197:    }
    198:
    199:    int BTree::myAlNum(char ch)
    200:    {
    201:       return isalnum(ch) ||
    202:          ch == '-' ||
    203:          ch == '\'' ||
    204:          ch == '(' ||
    205:          ch == ')';
    206:    }

Listing 16.3 Index.cpp

    1:       // **************************************************
    2:       // PROGRAM:  index
    3:       // FILE:     index.cpp
    4:       // PURPOSE:  provide fundamental btree functionality
    5:       // NOTES:
    6:       // AUTHOR:   Jesse Liberty (jl)
    7:       // REVISIONS: 11/1/94 1.0 jl  initial release
    8:       // **************************************************
    9:
    10:      #include "btree.hpp"
    11:      #include <ctype.h>
    12:
    13:      Index::Index(char* str):myPointer(1)
    14:      {
    15:         strncpy(myData,str,dataLen);
    16:         myData[dataLen]='\0';
    17:         for (int i = 0; i< (int)strlen(myData); i++)
    18:            myData[i] = toupper(myData[i]);
    19:      }
    20:
    21:      Index::Index(char* str, int ptr):myPointer(ptr)
    22:      {
    23:
    24:         strncpy(myData,str,dataLen);
    25:         myData[dataLen]='\0';
    26: for (int i = 0; i< (int)strlen(myData); i++)
    27:            myData[i] = toupper(myData[i]);
    28:
    29:      }
    30:
    31:      Index::Index(Index& rhs):
    32:         myPointer(rhs.GetPointer())
    33:      {
    34:         strcpy(myData, rhs.GetData());
    35:         for (int i = 0; i< (int)strlen(myData); i++)
    36:            myData[i] = toupper(myData[i]);
    37:
    38:      }
    39:
    40:      Index::Index():myPointer(0)
    41:      {
    42:         myData[0]='\0';
    43:      }
    44:
    45:      void Index::PrintKey()
    46:      {
    47:         cout << "  " << myData;
    48:      }
    49:
    50:      void Index::PrintPage()
    51:      {
    52:         cout << "\n" << myData << ": " ;
    53:         BTree::theIDXManager.GetPage(myPointer)->Print();
    54:      }
    55:
    56:      int Index::Insert(Index& ref, BOOL findOnly, BOOL findExact)
    57:      {
    58:         return BTree::theIDXManager.GetPage(myPointer)->Insert(ref,findOnly,findExact);
    59:      }
    60:
    61:      int Index::operator==(const Index& rhs)
    62:      {
    63:         return (strcmp(myData,rhs.GetData()) == 0); // case insensitive
    64:      }
    65:
    66:      int Index::Begins(const Index& rhs) const
    67:      {
    68:         int len = strlen(myData);
    69:         if (!len)
    70:            return TRUE;
    71:         return (strncmp(myData,rhs.GetData(),len) == 0);
    72:      }

Listing 16.4 Page.cpp

    1:      // **************************************************
    2:      // PROGRAM:  Page
    3:      // FILE:     Page.cpp
    4:      // PURPOSE:  provide fundamental btree functionality
    5:      // NOTES:
    6:      // AUTHOR:   Jesse Liberty (jl)
    7:      // REVISIONS: 11/1/94 1.0 jl  initial release
    8:      // **************************************************
    9:
    10:     #include "btree.hpp"
    11:     #include <assert.h>
    12:
    13:     // constructors
    14:
    15:     // default constructor
    16:     Page::Page()
    17:     {
    18:     }
    19:
    20:     // create a page from a buffer read in from disk
    21:     Page::Page(char *buffer):
    22:        myKeys((Index*)myVars.mk)
    23:     {
    24:        assert(sizeof(myBlock) == PageSize);
    25:        assert(sizeof(myVars) == PageSize);
    26:        memcpy(&myBlock,buffer,PageSize);
    27:        SetLocked(FALSE);
    28:        myTime = time(NULL);
    29:     }
    30:
    31:     // create a Page from the first index
    32:     Page::Page(Index& index, BOOL bLeaf):
    33:        myKeys((Index*)myVars.mk)
    34:     {
    35:
    36:        myVars.myCount=1;
    37:        myVars.IsLeaf = bLeaf;
    38:        SetLocked(FALSE);
    39:        // if this is a leaf, this is the first
    40:        // index on the first page, set its pointer
    41:        // based on creating a new wnj. otherwise
    42:        // you are here creating a new node, do not
    43:        // set the pointer, it is already set.
    44:        if (bLeaf)
    45:           index.SetPointer(BTree::theWNJFile.Append(index.GetPointer()));
    46:        myKeys[0]=index;
    47:        myVars.myPageNumber = gPage++;
    48:        myTime = time(NULL);
    49:     }
    50:
    51:     // create a page by splitting another page
    52:     Page::Page(Index *array, int offset, BOOL bLeaf,int count):
    53:        myKeys((Index*)myVars.mk)
    54:     {
    55:        myVars.IsLeaf = bLeaf;
    56:        myVars.myCount = 0;
    57:        for (int i=0, j = offset; j<Order && i < count; i++, j++)
    58:        {
    59:           myKeys[i]= array[j];
    60:           myVars.myCount++;
    61:        }
    62:        myVars.myPageNumber = gPage++;
    63:        SetLocked(FALSE);
    64:        myTime = time(NULL);
    65:     }
    66:
    67:     void Page::Nullify(int offset)
    68:     {
    69:        for (int i = offset; i<Order; i++)
    70:        {
    71:              myKeys[i].SetPointer(0);
    72:              myVars.myCount--;
    73:        }
    74:     }
    75:
    76:     // decide whether I'm a leaf or a node
    77:     // and pass this index to the right
    78:     // function. If findOnly is true, don't insert
    79:     // just return the page number (for now)
    80:     int Page::Insert(Index& rIndex, BOOL findOnly, BOOL synonym)
    81:     {
    82:        int result;
    83:        if (myVars.IsLeaf)
    84:        {
    85:           SetLocked(TRUE);
    86:           result = FindLeaf(rIndex,findOnly,synonym);
    87:           SetLocked(FALSE);
    88:           return result;
    89:        }
    90:        else
    91:        {
    92:           SetLocked(TRUE);
    93:           result = InsertNode(rIndex,findOnly,synonym);
    94:           SetLocked(FALSE);
    95:           return result;
    96:        }
    97:     }
    98:
    99:
    100:    // find the right page for this index
    101:    int Page::InsertNode(Index& rIndex, BOOL findOnly,BOOL synonym)
    102:    {
    103:       int retVal =0;
    104:       BOOL inserted = FALSE;
    105:       int i;
    106:
    107:       assert(myVars.myCount>0); // nodes have at least 1
    108:       assert(myKeys[0].GetPointer()); // must be valid
    109:
    110:       if (findOnly)
    111:       {
    112:          for (i = 0; i< myVars.myCount; i++)
    113:          {
    114:            if (!synonym)
    115:            {
    116:               if (rIndex.Begins(myKeys[i]))
    117:               {
    118:                  BTree::theIter.RecordNode(myVars.myPageNumber,i);
    119:                  return myKeys[i].Insert(rIndex,findOnly,synonym);
    120:               }
    121:            }
    122:            else
    123:            {
    124:                if (rIndex == (myKeys[i]))
    125:               {
    126:                  BTree::theIter.RecordNode(myVars.myPageNumber,i);
    127:                  return myKeys[i].Insert(rIndex,findOnly,synonym);
    128:               }
    129:
    130:
    131:            }
    132:          }
    133:       }
    134:
    135:
    136:       // does it go before my first entry?
    137:       if (!inserted && rIndex < myKeys[0])
    138:       {
    139:
    140:          if (findOnly)
    141:             return 0L;
    142:
    143:          myKeys[0].SetData(rIndex);
    144:          BTree::theIter.RecordNode(myVars.myPageNumber,0);
    145:          retVal=myKeys[0].Insert(rIndex,findOnly,synonym);
    146:          inserted = TRUE;
    147:       }
    148:
    149:       // find insertion point
    150:       if (!inserted)
    151:       {
    152:          for (i = myVars.myCount-1; i>=0; i--)
    153:          {
    154:             assert(myKeys[i].GetPointer());
    155:             if ( (rIndex >= myKeys[i]))
    156:             {
    157:                 BTree::theIter.RecordNode(myVars.myPageNumber,i);
    158:                 retVal=myKeys[i].Insert(rIndex,findOnly,synonym);
    159:                 inserted = TRUE;
    160:                 break;
    161:             }
    162:          }
    163:       }
    164:       assert(inserted);  // change to exception if not!
    165:
    166:       // if you had to split
    167:       if (retVal && !findOnly) // got back a pointer to a new page
    168:       {
    169:          Index * pIndex = new Index(GetPage(retVal)->GetFirstIndex());
    170:          pIndex->SetPointer(retVal);
    171:          retVal = InsertLeaf(*pIndex);
    172:       }
    173:       return retVal;
    174:    }
    175:
    176:    // called if current page is a leaf
    177:    int Page::FindLeaf(Index& rIndex, BOOL findOnly, BOOL synonym)
    178:    {
    179:       int result = 0;
    180:
    181:       // no duplicates!
    182:       for (int i=0; i < myVars.myCount; i++)
    183:       {
    184:          if (findOnly)
    185:          {
    186:             if (!synonym)
    187:             {
    188:               if (rIndex.Begins(myKeys[i]))
    189:               {
    190:                   BTree::theIter.RecordLeaf(myVars.myPageNumber,i);
    191:                   return myKeys[i].GetPointer();
    192:               }
    193:             }
    194:             else
    195:             {
    196:               if (rIndex ==(myKeys[i]))
    197:               {
    198:                   BTree::theIter.RecordLeaf(myVars.myPageNumber,i);
    199:                   return myKeys[i].GetPointer();
    200:               }
    201:
    202:
    203:             }
    204:           }
    205:           else
    206:             if (rIndex == myKeys[i])
    207:                   return BTree::theWNJFile.Insert(
    208:                               rIndex.GetPointer(),
    209:                                  myKeys[i].GetPointer());
    210:       }
    211:
    212:       if (findOnly)         // not found
    213:          return result;
    214:
    215:       // this index item does not yet exist
    216:       // before you push it into the index
    217:       // push an entry into the wnj.idx
    218:       // and set the index to point to that entry
    219:       if (!synonym)
    220:          rIndex.SetPointer(BTree::theWNJFile.Append(rIndex.GetPointer())); //
                  new!
    221:       return InsertLeaf(rIndex);
    222:    }
    223:
    224:    int Page::InsertLeaf(Index& rIndex)
    225:    {
    226:       int result = 0;
    227:       if (myVars.myCount < Order)
    228:          Push(rIndex);
    229:       else      // overflow the page
    230:       {
    231:             // make sibling
    232:             int NewPg =
    233:                NewPage(myKeys,Order/2,myVars.IsLeaf,myVars.myCount);
    234:             Page* Sibling = GetPage(NewPg);
    235:             Nullify(Order/2);                     // nullify my right half
    236:
    237:             // does it fit in this side?
    238:             if (myVars.myCount>Order/2-1 && rIndex <= myKeys[Order/2-1])
    239:                Push(rIndex);
    240:             else  // push it into the new sibling
    241:                Sibling->Push(rIndex);
    242:
    243:             result = NewPg; // we split, pass it up
    244:          }
    245:        return result;
    246:    }
    247:
    248:    // put the new index into this page (in order)
    249:    void Page::Push(Index& rIndex,int offset,BOOL first)
    250:    {
    251:       BOOL inserted = FALSE;
    252:       assert(myVars.myCount < Order);
    253:       for (int i=offset; i<Order && i<myVars.myCount; i++)
    254:       {
    255:          assert(myKeys[i].GetPointer());
    256:          if (rIndex <= myKeys[i])
    257:          {
    258:                Push(myKeys[i],offset+1,FALSE);
    259:                myKeys[i]=rIndex;
    260:                inserted = TRUE;
    261:                break;
    262:          }
    263:       }
    264:       if (!inserted)
    265:          myKeys[myVars.myCount] = rIndex;
    266:
    267:       if (first)
    268:          myVars.myCount++;
    269:    }
    270:
    271:
    272:    void Page::Print()
    273:    {
    274:          if (!myVars.IsLeaf)
    275:          {
    276:             BTree::NodePageCtr++;
    277:             BTree::NodeIndexPerPage[myVars.myCount]++;
    278:             BTree::NodeIndexCtr+=myVars.myCount;
    279:          }
    280:          else
    281:          {
    282:             BTree::LeafPageCtr++;
    283:             BTree::LeafIndexPerPage[myVars.myCount]++;
    284:             BTree::LeafIndexCtr+=myVars.myCount;
    285:          }
    286:
    287:          for (int i = 0; i<Order && i < myVars.myCount; i++)
    288:          {
    289:             assert(myKeys[i].GetPointer());
    290:             if (myVars.IsLeaf)
    291:                myKeys[i].PrintKey();
    292:             else
    293:                myKeys[i].PrintPage();
    294:          }
    295:    }
    296:
    297:    // write out the entire page as a block
    298:    fstream& Page::Write(fstream& file)
    299:    {
    300:       char buffer[PageSize];
    301:       memcpy(buffer,&myBlock,PageSize);
    302:       file.flush();
    303:       file.clear();
    304:       file.seekp(myVars.myPageNumber*PageSize);
    305:       file.write(buffer,PageSize);
    306:       return file;
    307:    }

Listing 16.5 Datafile.cpp

    1:       // **************************************************
    2:       // PROGRAM:  Data file
    3:       // FILE:     datafile.cpp
    4:       // PURPOSE:
    5:       // NOTES:
    6:       // AUTHOR:   Jesse Liberty (jl)
    7:       // REVISIONS: 11/5/94 1.0 jl  initial release
    8:       // **************************************************
    9:
    10:      #include "btree.hpp"
    11:      #include <assert.h>
    12:
    13:      // on construction, try to open the file if it exists
    14:      DataFile::DataFile():
    15:         myFile("ROBIN.DAT",
    16:         ios::binary | ios::in | ios::out | ios::nocreate | ios::app)
    17:      {
    18:      }
    19:
    20:      void DataFile::Create()
    21:      {
    22:         if (!myFile) // nocreate failed, first creation
    23:         {
    24:
    25:            // open the file, create it this time
    26:            myFile.clear();
    27:            myFile.open
    28:               ("ROBIN.DAT",ios::binary | ios::in | ios::out | ios::app);
    29:
    30:            char Header[offInt];
    31:            int MagicNumber = 1234; // a number we can check for
    32:            memcpy(Header,&MagicNumber,offInt);
    33:            myFile.clear();
    34:            myFile.flush();
    35:            myFile.seekp(0);
    36:            myFile.write(Header,offInt);
    37:            return;
    38:         }
    39:
    40:         // we did open the file, it already existed
    41:         // get the numbers we need
    42:         int MagicNumber;
    43:         myFile.seekg(0);
    44:         myFile.read((char *) &MagicNumber,offInt);
    45:
    46:         // check the magic number. If it is wrong the file is
    47:         // corrupt or this isn't the index file
    48:         if (MagicNumber != 1234)
    49:         {
    50:               // change to an exception!!
    51:               cout << "DataFile Magic number failed!";
    52:         }
    53:         return;
    54:      }
    55:
    56:      // write out the numbers we'll need next time
    57:      void DataFile::Close()
    58:      {
    59:         myFile.close();
    60:      }
    61:
    62:
    63:      int DataFile::Insert(char * newNote)
    64:      {
    65:         int len = strlen(newNote);
    66:
    67:
    68:         time_t theTime;
    69:         theTime = time(NULL);
    70:
    71:         int offTime = sizeof(time_t);
    72:         int fullLen = len + offInt + offTime;
    73:
    74:         char buffer[PageSize];
    75:         memcpy(buffer,&len,offInt);
    76:         memcpy(buffer+sizeof(len),&theTime,offTime);
    77:         memcpy(buffer+offInt+offTime,newNote,len);
    78:
    79:         myFile.clear();
    80:         myFile.flush();
    81:         myFile.seekp(0,ios::end);
    82:         int offset = (int) myFile.tellp();
    83:         myFile.write(buffer,fullLen);
    84:         myFile.flush();
    85:         return offset;
    86:      }
    87:
    88:      void DataFile::GetRecord(int offset, char* buffer, int& len, time_t&
               theTime)
    89:      {
    90:
    91:           int offLen = sizeof(int);
    92:         int offTime = sizeof(time_t);
    93:         char tmpBuff[PageSize];
    94:         myFile.flush();
    95:         myFile.clear();
    96:         myFile.seekg(offset);
    97:         myFile.read(tmpBuff,PageSize);
    98:         memcpy(&len,tmpBuff,offLen);
    99:         memcpy(&theTime,tmpBuff+offLen,offTime);
    100:        strncpy(buffer,tmpBuff+offLen+offTime,len);
    101:        buffer[len] = '\0';
    102:     }

Listing 16.6 IdxManager.cpp

    1:       // **************************************************
    2:       // PROGRAM:  Idxmr
    3:       // FILE:     Idxmgr.cpp
    4:       // PURPOSE:  provide i/o for btree
    5:       // NOTES:
    6:       // AUTHOR:   Jesse Liberty (jl)
    7:       // REVISIONS: 11/5/94 1.0 jl  initial release
    8:       // **************************************************
    9:
    10:      #include "btree.hpp"
    11:      #include <assert.h>
    12:
    13:      // on construction, try to open the file if it exists
    14:      IDXManager::IDXManager():
    15:         myFile("ROBIN.IDX",ios::binary | ios::in | ios::out | ios::nocreate)
    16:      {
    17:         // initialize the pointers to null
    18:         for (int i = 0; i< MaxPages; i++)
    19:            myPages[i] = 0;
    20:         myCount = 0;
    21:
    22:      }
    23:
    24:      // called by btree constructor
    25:      // if we opened the file, read in the numbers we need
    26:      // otherwise create the file
    27:      int IDXManager::Create()
    28:      {
    29:         if (!myFile) // nocreate failed, first creation
    30:         {
    31:            // open the file, create it this time
    32:            myFile.open("ROBIN.IDX",ios::binary | ios::in | ios::out);
    33:            char Header[PageSize];
    34:            int MagicNumber = 1234; // a number we can check for
    35:            memcpy(Header,&MagicNumber,offInt);
    36:            int NextPage = 1;
    37:            memcpy(Header+offInt,&NextPage,offInt);
    38:            memcpy(Header+2*offInt,&NextPage,offInt);
    39:            Page::SetgPageNumber(NextPage);
    40:            char title[]="ROBIN.IDX. Ver 1.00";
    41:            memcpy(Header+3*offInt,title,strlen(title));
    42:            myFile.flush();
    43:            myFile.clear();
    44:            myFile.seekp(0);
    45:            myFile.write(Header,PageSize);
    46:            return 0;
    47:         }
    48:
    49:         // we did open the file, it already existed
    50:         // get the numbers we need
    51:         int MagicNumber;
    52:         myFile.seekg(0);
    53:         myFile.read((char *) &MagicNumber,offInt);
    54:
    55:         // check the magic number. If it is wrong the file is
    56:         // corrupt or this isn't the index file
    57:         if (MagicNumber != 1234)
    58:         {
    59:               // change to an exception!!
    60:               cout << "Index Magic number failed!";
    61:               return 0;
    62:         }
    63:
    64:         int NextPage;
    65:         myFile.seekg(offInt);
    66:         myFile.read((char*) &NextPage,offInt);
    67:         Page::SetgPageNumber(NextPage);
    68:         int FirstPage;
    69:         myFile.seekg(2*offInt);
    70:         myFile.read((char*) &FirstPage,offInt);
    71:         const int room =  PageSize - 3*offInt;
    72:         char buffer[room];
    73:         myFile.read(buffer,room);
    74:         buffer[20]='\0';
    75:         // cout << buffer << endl;
    76:         // read in all the pages
    77:         for (int i = 1; i < NextPage; i++)
    78:         {
    79:               myFile.seekg(i * PageSize);
    80:               char buffer[PageSize];
    81:               myFile.read( buffer, PageSize);
    82:               Page * pg = new Page(buffer);
    83:               Insert(pg);
    84:         }
    85:
    86:         return FirstPage;
    87:      }
    88:
    89:      // write out the numbers we'll need next time
    90:      void IDXManager::Close(int theRoot)
    91:      {
    92:
    93:         for (int i = 0; i< MaxPages; i++)
    94:            if (myPages[i])
    95:               Save(myPages[i]);
    96:         int NextPage = Page::GetgPageNumber();
    97:         if (!myFile)
    98:            cout << "Error opening myFile!" << endl;
    99:         myFile.flush();
    100:        myFile.clear();
    101:        myFile.seekp(offInt);
    102:        myFile.write ( (char *) &NextPage,offInt);
    103:        myFile.seekp(2*offInt);
    104:        myFile.write((char*) &theRoot,offInt);
    105:        myFile.close();
    106:     }
    107:
    108:     // wrapper function
    109:     int IDXManager::NewPage(Index& index, BOOL bLeaf)
    110:     {
    111:        Page * newPage = new Page(index, bLeaf);
    112:        Insert(newPage);
    113:        Save(newPage);
    114:        return newPage->GetPageNumber();
    115:     }
    116:
    117:     int IDXManager::NewPage(
    118:        Index *array,
    119:        int offset,
    120:        BOOL leaf,
    121:        int count)
    122:     {
    123:        Page * newPage = new Page(array, offset, leaf,count);
    124:        Insert(newPage);
    125:        Save(newPage);
    126:        return newPage->GetPageNumber();
    127:     }
    128:
    129:     void IDXManager::Insert(Page * newPage)
    130:     {
    131:
    132:        // add new page into array of page managers
    133:        if (myCount < MaxPages)
    134:        {
    135:           assert(!myPages[myCount]);
    136:           myPages[myCount++] = newPage;
    137:        }
    138:        else  // no room, time to page out to disk
    139:        {
    140:           int lowest = -1;
    141:
    142:           for (int i = 0; i< MaxPages; i++)
    143:           {
    144:              if (myPages[i]->GetLocked() == FALSE )
    145:                 lowest = i;
    146:           }
    147:           if (lowest == -1)
    148:              assert(lowest != -1); // change to exception if -1 (no page to
                      kill)
    149:
    150:           for (i = 0; i< MaxPages; i++)
    151:              if (myPages[i]->GetTime() < myPages[lowest]->GetTime() &&
                      myPages[i]->GetLocked() == FALSE)
    152:                 lowest = i;
    153:
    154:           assert(myPages[lowest]);
    155:           Save(myPages[lowest]);
    156:           delete myPages[lowest];
    157:           myPages[lowest] = newPage;
    158:
    159:        }
    160:     }
    161:
    162:     // tell the page to write itself out
    163:     void IDXManager::Save(Page* pg)
    164:     {
    165:       pg->Write(myFile);
    166:     }
    167:
    168:     // see if the page is in memory, if so return it
    169:     // otherwise get it from disk
    170:     // note: this won't scale, with lots of page managers
    171:     // you'd need a more efficient search. 10 levels of page
    172:     // managers, with 31 indexes per page gives you room for
    173:     // 800 trillion words. Even if each page is only 1/2 full
    174:     // on average, 10 levels of depth would represent 64 million
    175:     // keys alone, not to mention the actual records.
    176:
    177:     Page * IDXManager::GetPage(int target)
    178:     {
    179:
    180:        for (int i = 0; i< MaxPages; i++)
    181:        {
    182:           if (myPages[i]->GetPageNumber() == target)
    183:              return  myPages[i];
    184:        }
    185:        myFile.seekg(target * PageSize);
    186:        char buffer[PageSize];
    187:        myFile.read( buffer, PageSize);
    188:        Page * pg = new Page(buffer);
    189:        Insert(pg);
    190:        return pg;
    191:     }

Listing 16.7 Iter.cpp

    1:      // **************************************************
    2:      // PROGRAM:  Iterator
    3:      // FILE:     iter.cpp
    4:      // PURPOSE:  iterator for BTree
    5:      // NOTES:
    6:      // AUTHOR:   Jesse Liberty (jl)
    7:      // REVISIONS: 12/1/94 1.0 jl  initial release
    8:      // **************************************************
    9:      #include "btree.hpp"
    10:
    11:     // history class implementations
    12:     History::History(int pno, int off, BOOL node):
    13:        PageNo(pno),
    14:        OffSet(off),
    15:        IsNode(node)
    16:     {}
    17:
    18:     // Iterator implementations
    19:
    20:     // start off with blank iterator
    21:     Iterator::Iterator()
    22:     {
    23:        Reset();
    24:     }
    25:
    26:     Iterator::~Iterator()
    27:     {
    28:        for (int i = 0; i<MaxPages; i++)
    29:           delete myHistory[i];
    30:     }
    31:
    32:     void Iterator::RecordNode(int PageNo, int OffSet)
    33:     {
    34:       myHistory[myNext++] = new History(PageNo,OffSet,TRUE);
    35:     }
    36:
    37:     void Iterator::RecordLeaf(int PageNo, int OffSet)
    38:     {
    39:       myHistory[myNext++] = new History(PageNo,OffSet,FALSE);
    40:     }
    41:
    42:     History* Iterator::GetLast()
    43:     {
    44:        return myHistory[myNext-1];
    45:     }
    46:
    47:     History* Iterator::GetFirst()
    48:     {
    49:        return myHistory[0];
    50:     }
    51:
    52:     void Iterator::Reset()
    53:     {
    54:        for (int i = 0; i<MaxPages; i++)
    55:           myHistory[i] = 0;
    56:        myNext = 0;
    57:     }
    58:
    59:
    60:     int Iterator::GetNext(const Index& theIndex)
    61:     {
    62:
    63:        for (;;)
    64:        {
    65:           History * pHist = GetLast();
    66:           int pgNo = pHist-> PageNo;
    67:           int newOffSet = pHist->OffSet+1;
    68:           Page * pg = BTree::theIDXManager.GetPage(pgNo);
    69:
    70:           if (newOffSet < pg->GetCount())
    71:           {
    72:              Index Idx = pg->GetIndex(newOffSet);
    73:              pHist->OffSet = newOffSet;
    74:              for (;;)
    75:              {
    76:                 if (pg->GetIsLeaf())
    77:                 {
    78:                    // cout << "Key: " << Idx.GetData() << endl;
    79:                    if (theIndex.Begins(Idx))
    80:                       return Idx.GetPointer();
    81:                    else
    82:                       return 0;
    83:                 }
    84:                 else
    85:                 {
    86:                 pg = BTree::theIDXManager.GetPage(Idx.GetPointer());
    87:                 Idx = pg->GetFirstIndex();
    88:                 pHist = new History(pg->GetPageNumber(),0, (BOOL)!pg->GetIsLeaf());
    89:                 myHistory[myNext++] = pHist;
    90:                 }
    91:              }  // end inner for loop
    92:           }
    93:           else
    94:           {
    95:              delete myHistory[myNext-1];
    96:              myHistory[myNext-1] = 0;
    97:              myNext--;
    98:              if (!myNext)
    99:                 return 0;
    100:          }
    101:       }  // end outer for loop
    102:    }

Listing 16.8 WNJFile.cpp

    1:       // **************************************************
    2:       // PROGRAM:  Word-Node-Join
    3:       // FILE:     WNJFile.cpp
    4:       // PURPOSE:
    5:       // NOTES:
    6:       // AUTHOR:   Jesse Liberty (jl)
    7:       // REVISIONS: 11/5/94 1.0 jl  initial release
    8:       // **************************************************
    9:
    10:      #include "btree.hpp"
    11:      #include <assert.h>
    12:
    13:      // on construction, try to open the file if it exists
    14:      WNJFile::WNJFile():
    15:         myFile("ROBINWNJ.IDX",
    16:         ios::binary | ios::in | ios::out | ios::nocreate)
    17:      {
    18:         for (int i = 0; i<5; i++)
    19:            myints[i]=0L;
    20:      }
    21:
    22:      void WNJFile::Create()
    23:      {
    24:
    25:         char Header[2*offInt];
    26:         int MagicNumber=0; // a number we can check for
    27:         int zero = 0;
    28:
    29:         if (!myFile) // nocreate failed, first creation
    30:         {
    31:            // open the file, create it this time
    32:            myFile.clear();
    33:            myFile.open("ROBINWNJ.IDX",
    34:               ios::binary | ios::in | ios::out);
    35:
    36:            MagicNumber = 1234;
    37:            memcpy(Header,&MagicNumber,offInt);
    38:            memcpy(Header+offInt,&zero,offInt);
    39:            myFile.seekp(0);
    40:            myFile.write(Header,2*offInt);
    41:            myFile.flush();
    42:            return;
    43:         }
    44:
    45:         // we did open the file, it already existed
    46:         // get the numbers we need
    47:
    48:
    49:         myFile.seekg(0);
    50:         myFile.read(Header,2*offInt);
    51:         memcpy(&MagicNumber,Header,offInt);
    52:         memcpy(&myCount,Header+offInt,offInt);
    53:
    54:         // check the magic number. If it is wrong the file is
    55:         // corrupt or this isn't the index file
    56:         if (MagicNumber != 1234)
    57:         {
    58:               // change to an exception!!
    59:               cout << "WNJ Magic number failed!";
    60:               cout << "Magic number: " << MagicNumber;
    61:               cout << "\nmyCount: " << myCount << endl;
    62:         }
    63:         return;
    64:      }
    65:
    66:      // write out the numbers we'll need next time
    67:      void WNJFile::Close()
    68:      {
    69:
    70:         myFile.seekg(offInt);
    71:         myFile.write((char*)&myCount,offInt);
    72:         myFile.close();
    73:      }
    74:
    75:      int WNJFile::Append(int DataOffset)
    76:      {
    77:
    78:         int newPos = 2*offInt + myCount++ * (5*offInt);
    79:         int offsets[5];
    80:         offsets[0] = DataOffset;
    81:         for (int i = 1; i<5; i++)
    82:            offsets[i]=0;
    83:         myFile.seekg(newPos);
    84:         myFile.write((char*)offsets,5*offInt);
    85:
    86:         return newPos;
    87:      }
    88:
    89:
    90:      int WNJFile::Insert(int DataOffset,int WNJOffset)
    91:      {
    92:
    93:         int ints[5];
    94:         myFile.seekg(WNJOffset);
    95:         myFile.read((char*)ints,5*offInt);
    96:
    97:         int offset=WNJOffset;
    98:
    99:         while (ints[4])
    100:        {
    101:           offset = ints[4];
    102:           myFile.clear();
    103:           myFile.flush();
    104:           myFile.seekg(ints[4]);
    105:           myFile.read((char*)ints,5*offInt);
    106:        }
    107:        if (ints[3]) // full!
    108:        {
    109:           ints[4] = Append(DataOffset);
    110:           myFile.clear();
    111:           myFile.flush();
    112:           myFile.seekg(offset);
    113:           myFile.write((char*)ints,5*offInt);
    114:        }
    115:        else
    116:        {
    117:           for (int i = 0; i<4; i++)
    118:           {
    119:              if (ints[i] == 0)
    120:              {
    121:                 ints[i] = DataOffset;
    122:                 myFile.clear();
    123:                 myFile.flush();
    124:                 myFile.seekg(offset);
    125:                 myFile.write((char*)ints,5*offInt);
    126:                 break;
    127:              }
    128:           }
    129:        }
    130:        return 0;
    131:     }
    132:
    133:
    134:     int* WNJFile::Find(int NextWNJ)
    135:     {
    136:        int ints[5];
    137:
    138:        int * results = new int[256];
    139:
    140:        int i = 0, j=0;
    141:
    142:        while (j<256)
    143:           results[j++] = 0;
    144:
    145:        j = 0;
    146:
    147:        myFile.seekg(NextWNJ);
    148:        myFile.read((char*)ints,5*offInt);
    149:
    150:        while (j < 256)
    151:        {
    152:           if (ints[i])
    153:           {
    154:              if (i == 4)
    155:              {
    156:                 myFile.seekg(ints[4]);
    157:                 myFile.read((char*)ints,5*offInt);
    158:                 i = 0;
    159:                 continue;
    160:              }
    161:           results[j++] = ints[i++];
    162:           }
    163:           else
    164:              break;
    165:        }
    166:        return results;
    167:     }

Listing 16.9 R2.cpp

    1:      // **************************************************
    2:      // PROGRAM:  R2
    3:      // FILE:     r2.cpp
    4:      // PURPOSE:  Add synonyms to ROBIN.
    5:      // NOTES:
    6:      // AUTHOR:   Jesse Liberty (jl)
    7:      // REVISIONS: 1/1/95 1.0 jl
    8:      // **************************************************
    9:
    10:
    11:     #include "stdef.hpp"
    12:     #include "btree.hpp"
    13:     #include <stdlib.h>
    14:
    15:     // static definitions
    16:     IDXManager BTree::theIDXManager;
    17:     DataFile BTree::theDataFile;
    18:     WNJFile BTree::theWNJFile;
    19:     Iterator BTree::theIter;
    20:
    21:     int WNJFile::myCount=0L;
    22:     int Page::gPage=1;
    23:     int BTree::NodeIndexCtr=0;
    24:     int BTree::LeafIndexCtr=0;
    25:     int BTree::NodePageCtr=0;
    26:     int BTree::LeafPageCtr=0;
    27:     int BTree::NodeIndexPerPage[Order+1];
    28:     int BTree::LeafIndexPerPage[Order+1];
    29:
    30:
    31:     // prototypes
    32:     void parseCommandLines(char *buffer,int argc,char **argv);
    33:     void ShowMenu(long*);
    34:     void DoFind(int, char**, BTree&);
    35:     void ParseFile(int, char**, BTree&);
    36:     void DoSyn(char *orig, char* syn, BTree& myTree);
    37:
    38:     // driver program
    39:     int main(int argc, char ** argv)
    40:     {
    41:        BTree myTree;
    42:
    43:        for (int i = 0; i < Order +2; i++)
    44:        {
    45:           BTree::NodeIndexPerPage[i] =0;
    46:           BTree::LeafIndexPerPage[i] = 0;
    47:        }
    48:
    49:        char buffer[PageSize+1];
    50:
    51:        if (argc == 1)
    52:        {
    53:           cout << "Please provide command line arguments";
    54:           return 1;
    55:        }
    56:
    57:        // check for flags, if none add text to data file
    58:        if (argv[1][0] == '-')
    59:        {
    60:
    61:           switch (argv[1][1])
    62:           {
    63:              case '?':
    64:                 DoFind(argc,argv,myTree);
    65:                 break;
    66:
    67:              case '!':
    68:                    myTree.PrintTree();
    69:                    break;
    70:
    71:              case 'F':
    72:              case 'f':
    73:                    ParseFile(argc,argv,myTree);
    74:                    break;
    75:
    76:              case 'S':
    77:              case 's':
    78:                   DoSyn(argv[2],argv[3],myTree);
    79:                   break;
    80:           }
    81:        }
    82:        else
    83:        {
    84:           parseCommandLines(buffer,argc,argv);
    85:           myTree.Insert(buffer,argc,argv);
    86:           cout << "Inserted.\n";
    87:        }
    88:        return 0;
    89:     }
    90:
    91:     // concatenate remaining command line arguments
    92:     void parseCommandLines(char *buffer,int argc,char **argv)
    93:     {
    94:        size_t len = 0;
    95:        size_t argLen=0;
    96:        for (int i = 1; i< argc; i++)
    97:        {
    98:           argLen = strlen(argv[i]);
    99:           if (len + argLen +2 < PageSize)
    100:          {
    101:             strncpy(buffer+len,argv[i],argLen);
    102:             strncpy(buffer+len+argLen," ",1);
    103:             len += argLen+1;
    104:          }
    105:       }
    106:       buffer[len] = '\0';
    107:    }
    108:
    109:    // having found matches, show the menu of choices
    110:    // each entry is numbered and dated
    111:    void ShowMenu(int *list)
    112:    {
    113:       int j=0;
    114:       char buffer[PageSize+1];
    115:       time_t theTime;
    116:       int len;
    117:       char listBuff[256];
    118:       struct tm * ts;
    119:       int dispSize;
    120:
    121:       while (list[j] && j < 20)
    122:       {
    123:          BTree::theDataFile.GetRecord(list[j],buffer,len, theTime);
    124:       #if defined(_MSVC_16BIT) || defined(_MSVC_32BIT)
       :       {
       :          dispSize = __min(len,32); // THIS is a DOUBLE UNDERSCORE
       :       }
       :       #else
       :       {
       :          dispSize = min(len,32);
       :       }
       :       #endif
    125:          strncpy(listBuff,buffer,dispSize);
    126:          if (dispSize == 32)
    127:             {
    128:             listBuff[29] = '.';
    129:             listBuff[30] = '.';
    130:             listBuff[31] = '.';
    131:             }
    132:          listBuff[dispSize]='\0';
    133:          ts = localtime(&theTime);
    134:          cout << "[" << (j+1) << "] ";
    135:          cout << ts->tm_mon << "/";
    136:          cout << ts->tm_mday << "/";
    137:          cout << ts->tm_year << " ";
    138:          cout <<  listBuff << endl;
    139:          j++;
    140:       }
    141:    }
    142:
    143:    // handle -? command
    144:    // find matches, show the menu, request choice
    145:    // display record and redisplay menu
    146:    void DoFind(int argc, char ** argv, BTree& myTree)
    147:    {
    148:
    149:       // create an array of total set of WNJ
    150:       // offsets. This will be used to display
    151:       // choices and to find actual text
    152:       int  list[PageSize];
    153:
    154:       // initialize the array to all zeros
    155:       for (int i = 0; i<PageSize; i++)
    156:             list[i] = 0;
    157:
    158:       // for each word in the command line
    159:       // search for the matching set of records
    160:       for (i = 2; i< argc; i++)
    161:       {
    162:          int offset = myTree.Find(argv[i]);
    163:          while (offset)
    164:          {
    165:            int* found  =  BTree::theWNJFile.Find(offset);
    166:            int j=0;
    167:            int len;
    168:            time_t theTime;
    169:            char buff[512];
    170:            while (found[j])
    171:            {
    172:               BTree::theDataFile.GetRecord(found[j],buff,len, theTime);
    173:               struct tm * ts = localtime(&theTime);
    174:               cout << "Found: ";
    175:               cout << ts->tm_mon << "/";
    176:               cout << ts->tm_mday << "/";
    177:               cout << ts->tm_year << " ";
    178:               cout <<  buff << endl;
    179:               j++;
    180:            }
    181:            delete [] found;
    182:            offset = myTree.GetNext(argv[i]);
    183:         }
    184:       }
    185:
    186:          cout << "\n";
    187:
    188:          if (!list[0])
    189:             return;
    190:
    191:
    192:          ShowMenu(list);
    193:
    194:          int choice;
    195:          char buffer[PageSize];
    196:          int len;
    197:          time_t theTime;
    198:
    199:          for (;;)
    200:          {
    201:             cout << "Choice (0 to stop): " ;
    202:             cin >> choice;
    203:             if (!choice)
    204:                break;
    205:             BTree::theDataFile.GetRecord(list[choice-1],buffer,len, theTime);
    206:             cout << "\n>> ";
    207:             cout << buffer;
    208:             cout << "\n\n";
    209:             ShowMenu(list);
    210:          }
    211:    }
    212:
    213:    // open a file and create a new note for each line
    214:    // index every word in the line
    215:    void ParseFile(int argc,char **argv, BTree& myTree)
    216:    {
    217:
    218:       char buffer[PageSize];
    219:       char theString[PageSize];
    220:
    221:       ifstream theFile(argv[2],ios::in );
    222:       if (!theFile)
    223:       {
    224:          cout << "Error opening input file!\n";
    225:          return;
    226:       }
    227:       int offset = 0;
    228:       for (;;)
    229:       {
    230:             theFile.read(theString,PageSize);
    231:             int len = theFile.gcount();
    232:             if (!len)
    233:                break;
    234:             theString[len]='\0';
    235:             char *p1, *p2, *p0;
    236:             p0 = p1 = p2 = theString;
    237:
    238:             while (p1[0] && (p1[0] == '\n' || p1[0] == '\r'))
    239:                p1++;
    240:
    241:             p2 = p1;
    242:
    243:             while (p2[0] && p2[0] != '\n' && p2[0] != '\r')
    244:                p2++;
    245:
    246:             int bufferLen = int(p2 - p1);
    247:             int totalLen = int (p2 - p0);
    248:
    249:             if (!bufferLen)
    250:                continue;
    251:
    252:             // lstrcpyn(buffer,p1,bufferLen);
    253:             strncpy(buffer,p1,bufferLen);
    254:             buffer[bufferLen]='\0';
    255:
    256:             // for (int i = 0; i< PageSize; i++)
    257:                cout << "\r";
    258:             cout << "Parsing " << buffer;
    259:             myTree.Insert(buffer);
    260:             offset += totalLen;
    261:             theFile.clear();
    262:             theFile.seekg(offset,ios::beg);
    263:          }
    264:       }
    265:
    266:
    267:   // add synonyms to the tree
    268:    void DoSyn(char *orig, char* syn, BTree& myTree)
    269:    {
    270:       int offset = myTree.FindExact(syn);
    271:       if (!offset)  // syn can't exist
    272:       {
    273:          int offset = myTree.FindExact(orig);
    274:          if (offset) // orig must exist
    275:             myTree.AddKey(syn,offset,TRUE);
    276:       }
    277:    }

Output:

Note: Numbering has been added to the output to make analysis easier. Your output will not include this numbering.

    1:     d:\>r2 now is the time for all good men to
    2:     Inserted.
    3:
    4:     d:\>r2 come to the aid of their party
    5:     Inserted.
    6:
    7:     d:\>r2 what is the meaning of this
    8:     Inserted.
    9:
    10:    d:\>r2 eternal vigilance is the price of liberty
    11:    Inserted.
    12:
    13:    d:\>r2 when in Rome, do as the Romans
    14:    Inserted.
    15:
    16:    d:\>r2 -!
    17:
    18:    AID:
    19:    AID:   AID  ALL
    20:    COME:   COME  ETERNAL  FOR
    21:    GOOD:   GOOD  LIBERTY  MEANING  MEN
    22:    NOW:
    23:    NOW:   NOW  PARTY
    24:    PRICE:   PRICE  ROMANS  ROME
    25:    THE:   THE  THEIR
    26:    THIS:
    27:    THIS:   THIS  TIME
    28:    VIGILANCE:   VIGILANCE  WHAT  WHEN
    29:
    30:    Stats:
    31:    Node pages:   4
    32:    Leaf pages:   8
    33:    Node indexes: 11
    34:    Leaf indexes: 21
    35:    Pages with 2 nodes:  1
    36:    Pages with 2 leaves: 4
    37:    Pages with 3 nodes:  3
    38:    Pages with 3 leaves: 3
    39:    Pages with 4 leaves: 1
    40:
    41:    d:\>r2 -? liberty
    42:    Found: 0/16/95 eternal vigilance is the price of liberty
    43:
    44:
    45:    d:\>r2 -? jesse
    46:
    47:
    48:    d:\>r2 -S liberty jesse
    49:
    50:    d:\>r2 -? liberty
    51:    Found: 0/16/95 eternal vigilance is the price of liberty
    52:
    53:
    54:    d:\>r2 -? jesse
    55:    Found: 0/16/95 eternal vigilance is the price of liberty
Analysis:

Let's start by examining the output. The data files and indexes were deleted before beginning this test of the program.

In lines 1, 4, 7, 10, and 13, data was added to the files. In line 16, a report was requested, which is shown in lines 18 through 39. You will note that this database is order 4 (4 indexes on a page), which is not efficient, but is useful during debugging.

In line 41, the user searches for the term liberty; it is found, as shown in line 42. In line 45, the user searches for the term jesse, but this is not found. In line 48, the user establishes jesse as a synonym for liberty. There is no feedback, although it would be nice if the program came back and verified the addition of the synonym.

In line 50, the user again searches for liberty and receives the same results as the first attempt. In line 54, the user searches for jesse and this time receives a response: The same as if the term liberty had been searched for, which is consistent with the establishment of a synonym.

The easiest way to see how the synonym is established is to examine lines 76 through 79 of listing 16.9. When the user enters -S, a synonym is to be created. The next term and the word following are passed to the new DoSyn() method, as shown in lines 268 through 277 of the same listing.

In line 270, the tree is searched for the synonym. If it exists already as a term in the database, it will not be added as a synonym. Otherwise, the tree is searched for the original term.

Note that a new method has been added to BTree: FindExact. The implementation is shown in lines 46 through 54 of listing 16.2. The significant difference from Find is that the call to the page's Insert() method passes in TRUE to the last two parameters. These are declared in line 233 of listing 16.1, in the declaration of the Page object. The implementation is shown in lines 80 through 97 of listing 16.4.

Note that the Boolean value synonym is passed along to both FindLeaf() and InsertNode(). These values are passed from method to method until they are needed. The first use of the value is in line 114 of listing 16.4. It is imperative to match only on an exact match with the original term, so the value for synonym is examined and the correct method (Begins() or operator==) is called accordingly. This is repeated in lines 186 through 200.

If the original term is not found, the synonym will not be added. If the term is found, BTree::AddKey() is called, passing in the synonym, the offset, and the Boolean value TRUE.

Examination of the interface for the AddKey() method, as shown in line 139 of listing 16.1, shows that AddKey() has a third parameter, synonym, which defaults to FALSE.

Program execution jumps to line 100 of listing 16.2. An index object is created, using the new term as its str value, and the offset returned by the call to FindExact() as its offset value. This is exactly what you want: The index now points to the correct entry in the WNJFile. The trick will be to avoid adding this record to the data file and giving it its own WNJFile entry.

Once again, the synonym flag is used, this time in line 219 of listing 16.4. When this flag is set, the index object is not passed to the WNJFile for appending, and its pointer is not set to point to the new WNJFile record. Because the pointer already points to the correct WNJFile record, the index is left as it is before being added to the page.

Summary

Today you learned how to add synonyms to the utility, and in general how to create extensible programs. You examined the cost of developing a program that is maintainable and that can grow over time, and you learned a specific technique for adding synonyms to a database program.

Q&A

Q: What does it take to make a program maintainable?

A: The program must be designed with a clarity of vision, the modules must be discrete, the interfaces must be well-defined, and the code must be well-documented.

Q: Are smaller programs more maintainable than larger programs?

A: All things being equal, it is, of course, easier to understand a smaller program. A well-written large program, however, can be far easier to maintain than a small utility written without regard to documentation and maintainability.

Q: If I'm the programmer and no one else is working on the program, why bother with comments?

A: When you return to your program in a few months (to fix a bug, to extend its functionality, or simply to learn how you solved a problem that has returned in a new program), you will need the comments to help you remember how the program works in detail.

Workshop

The Workshop provides quiz questions to help you solidify your understanding of the material covered, and exercises to provide you with experience in using what you have learned. Try to answer the quiz and exercise questions before checking the answers in Appendix A, and make sure that you understand the answers before continuing to the next chapter.

Quiz

  1. What makes a program maintainable?

  2. Explain the difference between good comments and poor comments.

  3. What is legacy code and why is it a challenge?

  4. What is the purpose of a synonym in a database program?

[Click here for Answers]

Exercises

  1. Rewrite the methods that currently take the synonym and findExact flags not to take these flags. Create static flags in Btree and use these instead.

  2. Add functionality to the program to signal whether a synonym was added. If a synonym was not added, tell the user what went wrong.

  3. Add a help message that is printed out if the user runs ROBIN with an unsupported switch.

  4. Add a feature to accept multiple lines for input from cin.

  5. Modify the feature added in exercise 4 so that after the user indicates that he is done adding lines, it shows how many new lines were added.

Go to: Table of Contents | Next Page