ROBIN is now almost a full-fledged application. It is time to turn your attention to making the code more robust and ensuring that it is maintainable. In the next few chapters, I'll discuss profiling programs to enhance performance, using a debugger to track down serious problems, and building tough maintainable code. Today, however, I'll try to prove that ROBIN is extensible, that the architecture allows for growth and enhancements.
There are a number of features that you might want to add to ROBIN, even at this early date. The most pressing, in my opinion, is the capability to designate synonyms, so that when searching for computer, for example, ROBIN will match records with PC and MAC in the title. Today you will learn
One of the hallmarks of maintainable code is that you can explain how your program works to an intelligent programmer in a matter of hours. If your program is too complicated to explain, no matter how clever it is, it will not be maintainable over the long haul.
The second hallmark of maintainable code, however, is that it can be extended to incorporate new features. This extensibility is crucial. Many programmers spend much of their career building on legacy code programs they didn't write, but that they must maintain. A good program allows for the addition of new features without breaking.
ROBIN was not originally designed to include synonyms, but the architecture is sufficiently nimble and the complexity is sufficiently isolated that adding this feature should not be an overwhelming task. A good programmer should be able to get the fundamental functionality in place in less than one day.
ROBIN's current design works like this: When the user asks to find a term, the B-tree is passed the string for which to search. The B-tree starts at its first page and asks each page for the first index on that page that "begins with" the text passed in.
When the index is found, the page is examined to see whether it is a node or a leaf. If it is a node, the index points to the next page in the chain. After a leaf is found, the index points to a WNJ (Word-Node-Join) record. The WNJ record includes a list of offsets into the data file where the actual record resides.
When you insert a synonym (PC, for example) for a term (Computer, for example) you want the tree to find the same WNJFile record for the synonym (PC) as it does for the term (COMPUTER).
The algorithm for inserting the synonym is straightforward: Find the WNJ record for the original term and then create a new index for the new term pointing to that WNJ record.
Remember that when a new index is added, it originally points to the data in the file but then is adjusted to point to a new WNJ record. In this case, you want to fill the index with the WNJ record index of the original term, and not allow it to be adjusted.
Two significant changes must be made to the code, in addition to adding a function to handle the Add Synonym request. First, there must be logic to distinguish this special case so that the index is handled properly. In addition, there must be logic to ensure that only an exact match is found for the original term; if you are adding a synonym for phone (telephone, for example) you do not want to add that synonym to phonetics!
In both cases, a flag is required: Are you adding a synonym? There are two ways to handle this: I've chosen to pass these flags to every function that needs them, defaulting their value to FALSE (the more common case). An alternative would be to set a static Boolean variable in the tree, and to get this value each time the test needs to be tried.
In retrospect, I suspect the code would be cleaner if this and the related flags (FindOnly, for example) were moved into static variables in BTREE. This is left, as they say, as an exercise for the reader.
Listing 16.1 shows the modified BTree.hpp, including the new flags on the node insertion and find methods, as discussed in the analysis that follows. Listing 16.2 is Btree.cpp, with a modified AddKey() method. Listing 16.3 is Index.cpp, which changed to accommodate the new flags. Listing 16.4 is Page.cpp, where the bulk of the changes are located. Listing 16.5 (Datafile.cpp), listing 16.6 (IdxManager.cpp), listing 16.7 (Iter.cpp), and listing 16.8 (WNJFile.cpp) are included for completeness. Finally, listing 16.9 is the driver program, which enables you to enter synonyms using the syntax R2 -Synonym OldTerm NewTerm.
Note: The listings in this chapter are for illustrative purposes only. Your compiler might issue a warning if you use the listings exactly as shown in this chapter.
Listing 16.1 BTree.hpp
1: // ************************************************** 2: // PROGRAM: Btree, Page and Index declarations 3: // FILE: btree.hpp 4: // PURPOSE: provide fundamental btree functionality 5: // NOTES: 6: // AUTHOR: Jesse Liberty (jl) 7: // REVISIONS: 11/1/94 1.0 jl initial release 8: // ************************************************** 9: #ifndef BTREE_HPP // inclusion guards 10: #define BTREE_HPP 11: 12: #include <time.h> 13: #include <string.h> 14: #include <fstream.h> 15: 16: #include "stdef.hpp" 17: 18: const int Order = 4; //31; // 31 indexes and 1 header 19: const int dataLen = 11; // length of a key 20: const int MaxPages = 500; // more than we need 21: const int SizeItem = 16; // key + offset 22: const int SizePointer = 2; // size of offset 23: const int PageSize = (Order+1) * SizeItem; 24: const int offInt = 2; 25: 26: // forward declarations 27: class Page; 28: class Index; 29: 30: // all members are public 31: // used only by Iterator 32: struct History 33: { 34: History(int, int,BOOL); 35: int PageNo; 36: int OffSet; 37: BOOL IsNode; 38: }; 39: 40: 41: // Iterator class keeps track 42: // of current (and next) index 43: class Iterator 44: { 45: public: 46: Iterator(); 47: ~Iterator(); 48: void RecordNode(int, int); 49: void RecordLeaf(int, int); 50: History * GetLast(); 51: History * GetFirst(); 52: int GetNext(const Index&); 53: void Reset(); 54: 55: private: 56: History* myHistory[MaxPages]; 57: int myNext; 58: }; 59: 60: 61: class DataFile 62: { 63: public: 64: // constructors & destructor 65: DataFile(); 66: ~DataFile(){} 67: 68: // management of files 69: void Close(); 70: void Create(); 71: void GetRecord(int, char*, int&, time_t&); 72: 73: int Insert(char *); 74: 75: private: 76: fstream myFile; 77: }; 78: 79: class WNJFile 80: { 81: public: 82: // constructors & destructor 83: WNJFile(); 84: ~WNJFile(){} 85: 86: // management of files 87: void Close(); 88: void Create(); 89: int* Find(int NextWNJ); 90: int Insert(int, int); 91: int Append(int); 92: 93: private: 94: static int myCount; 95: fstream myFile; 96: union 97: { 98: int myints[5]; 99: char myArray[5*4]; 100: }; 101: }; 102: 103: // IDXManager - in memory keeps track of what pages are 104: // already in memory and swaps to disk as required 105: class IDXManager 106: { 107: public: 108: // constructors & destructor 109: IDXManager(); 110: ~IDXManager(){} 111: 112: // management of files 113: void Close(int); 114: int Create(); 115: 116: // Page manipulation 117: Page* GetPage(int); 118: void SetPage(Page*); 119: void Insert(Page * newPage); 120: void Save(Page* pg); 121: int NewPage(Index&, BOOL); 122: int NewPage(Index *array, int offset, BOOL leaf, int count); 123: 124: private: 125: Page* myPages[MaxPages]; 126: fstream myFile; 127: int myCount; 128: }; 129: 130: // the btree itself - has a pointer to first page 131: class BTree 132: { 133: public: 134: // constructors and destructor 135: BTree(); 136: ~BTree(); 137: 138: // utility functions 139: void AddKey(char* data, int offset,BOOL synonym = FALSE); 140: void Insert(char* buffer, int, char**); 141: void Insert(char*); 142: void PrintTree(); 143: int Find(char* data); 144: int FindExact(char* data); 145: int GetNext(char* data); 146: 147: // page methods 148: Page* GetPage(int page) 149: { return theIDXManager.GetPage(page); } 150: int NewPage(Index& pIndex, BOOL IsLeaf) 151: { return theIDXManager.NewPage(pIndex,FALSE); } 152: 153: static int myAlNum(char ch); 154: // public static member! 155: static IDXManager theIDXManager; 156: static WNJFile theWNJFile; 157: static DataFile theDataFile; 158: static Iterator theIter; 159: 160: static BOOL GetWord(char*, char*, int&); 161: static void GetStats(); 162: static int NodeIndexCtr; 163: static int LeafIndexCtr; 164: static int NodePageCtr; 165: static int LeafPageCtr; 166: static int NodeIndexPerPage[Order+1]; 167: static int LeafIndexPerPage[Order+1]; 168: 169: private: 170: int myRoot; 171: }; 172: 173: // index objects point to pages or real data 174: class Index 175: { 176: public: 177: // constructors & destructor 178: Index(); 179: Index(char *); 180: Index (char*, int); 181: Index(Index&); 182: ~Index(){} 183: 184: // accessors 185: const char * GetData() const { return myData; } 186: void SetData(const Index& rhs) 187: { strcpy(myData,rhs.GetData()); } 188: void SetData(const char * rhs) 189: { strcpy(myData,rhs); } 190: int GetPointer()const { return myPointer; } 191: void SetPointer (int pg) { myPointer = pg; } 192: 193: // utility functions 194: void PrintKey(); 195: void PrintPage(); 196: int Insert(Index& ref,BOOL findOnly = FALSE, BOOL findExac = FALSE); 197: 198: int Begins(const Index& rhs) const; 199: 200: // overloaded operators 201: int operator==(const Index& rhs); 202: 203: int operator < (const Index& rhs) 204: {return strcmp(myData,rhs.GetData())<0;} 205: 206: int operator <= (const Index& rhs) 207: {return strcmp(myData,rhs.GetData())<=0;} 208: 209: int operator > (const Index& rhs) 210: {return strcmp(myData,rhs.GetData())>0;} 211: 212: int operator >= (const Index& rhs) 213: {return strcmp(myData,rhs.GetData())>=0;} 214: 215: public: 216: int myPointer; 217: char myData[SizeItem - SizePointer]; 218: }; 219: 220: 221: // pages - consist of header and array of indexes 222: class Page 223: { 224: public: 225: // constructors and destructor 226: Page(); 227: Page(char*); 228: Page(Index&,BOOL); 229: Page(Index*, int, BOOL, int); 230: ~Page(){} 231: 232: // insertion and searchoperations 233: int Insert(Index&, BOOL findOnly = FALSE, BOOL synonym = FALSE); 234: int Find(Index& idx) { return Insert(idx,TRUE); } 235: int InsertLeaf(Index&); 236: int FindLeaf(Index&,BOOL findOnly, BOOL synonym = FALSE); 237: int InsertNode(Index&,BOOL findOnly = FALSE, BOOL synonym = FALSE); 238: void Push(Index&,int offset=0, BOOL=TRUE); 239: 240: // accessors 241: Index GetFirstIndex() { return myKeys[0]; } 242: Index GetIndex(int offset) { return myKeys[offset]; } 243: BOOL GetIsLeaf() const { return myVars.IsLeaf; } 244: int GetCount() const { return myVars.myCount; } 245: void SetCount(int cnt) { myVars.myCount=cnt; } 246: time_t GetTime() { return myTime; } 247: BOOL GetLocked() { return myVars.IsLocked; } 248: void SetLocked (BOOL state) { myVars.IsLocked = state; } 249: 250: // page manipulation 251: int GetPageNumber() const { return myVars.myPageNumber; } 252: Page* GetPage(int page) 253: { return BTree::theIDXManager.GetPage(page); } 254: int NewPage(Index& rIndex, BOOL IsLeaf) 255: { return BTree::theIDXManager.NewPage(rIndex,FALSE); } 256: 257: int NewPage(Index* arr, int off,BOOL isL, int cnt) 258: { return BTree::theIDXManager.NewPage(arr,off,isL, cnt); } 259: 260: // utility functions 261: void Nullify(int offset); 262: void Print(); 263: fstream& Write(fstream&); 264: void ReCount(); 265: 266: static int GetgPageNumber(){ return gPage; } 267: static void SetgPageNumber(int pg) { gPage = pg; } 268: 269: private: 270: Index * const myKeys; // will point to myVars.mk 271: union 272: { 273: char myBlock[PageSize]; // a page from disk 274: struct 275: { 276: int myCount; 277: BOOL IsLeaf; 278: int myPageNumber; 279: BOOL IsLocked; 280: char filler[8]; // we want 16 bytes of overhead 281: char mk[Order*sizeof(Index)]; // array of indexes 282: }myVars; 283: }; 284: 285: // memory only 286: static int gPage; 287: time_t myTime; // for lifo queue 288: }; 289: 290: #endif
Listing 16.2 BTree.cpp
1: // ************************************************** 2: // PROGRAM: Btree 3: // FILE: btree.cpp 4: // PURPOSE: provide fundamental btree functionality 5: // NOTES: 6: // AUTHOR: Jesse Liberty (jl) 7: // REVISIONS: 11/1/94 1.0 jl initial release 8: // ************************************************** 9: 10: #include "btree.hpp" 11: #include "stdef.hpp" 12: #include <ctype.h> 13: #include <stdlib.h> 14: 15: // construct the tree 16: // initialize myRoot pointer to nothing 17: // either create or read the index file 18: BTree::BTree():myRoot(0) 19: { 20: myRoot = theIDXManager.Create(); 21: theDataFile.Create(); 22: theWNJFile.Create(); 23: 24: } 25: 26: // write out the index file 27: BTree::~BTree() 28: { 29: theIDXManager.Close(myRoot); 30: theDataFile.Close(); 31: theWNJFile.Close(); 32: } 33: 34: // find an existing record 35: int BTree::Find(char * str) 36: { 37: Index index(str); 38: BTree::theIter.Reset(); 39: if (!myRoot) 40: return 0L; 41: else 42: return GetPage(myRoot)->Find(index); 43: } 44: 45: // find an existing record 46: int BTree::FindExact(char * str) 47: { 48: Index index(str); 49: BTree::theIter.Reset(); 50: if (!myRoot) 51: return 0L; 52: else 53: return GetPage(myRoot)->Insert(index,TRUE, TRUE); 54: } 55: 56: 57: int BTree::GetNext(char * str) 58: { 59: Index index(str); 60: return BTree::theIter.GetNext(index); 61: } 62: 63: 64: void BTree::Insert(char * buffer, int argc, char** argv) 65: { 66: // get each word, 67: int offset = theDataFile.Insert(buffer); 68: for (int i = 1; i<argc; i++) 69: { 70: AddKey(argv[i],offset); 71: } 72: 73: } 74: 75: void BTree::Insert(char* buffer) 76: { 77: 78: if (strlen(buffer) < 3) 79: return; 80: 81: char *buff = buffer; 82: char word[PageSize]; 83: int wordOffset = 0; 84: int offset; 85: 86: if (GetWord(buff,word,wordOffset)) 87: { 88: offset = theDataFile.Insert(buffer); 89: AddKey(word,offset); 90: } 91: 92: 93: while (GetWord(buff,word,wordOffset)) 94: { 95: AddKey(word,offset); 96: } 97: 98: } 99: 100: void BTree::AddKey(char * str, int offset, BOOL synonym ) 101: { 102: 103: if (strlen(str) < 3) 104: return; 105: 106: int retVal =0; 107: Index index(str,offset); 108: if (!myRoot) 109: { 110: myRoot = theIDXManager.NewPage (index,TRUE); 111: } 112: else 113: { 114: retVal = GetPage(myRoot)->Insert(index,FALSE, synonym); 115: if (retVal) // our root split 116: { 117: // create a pointer to the old top 118: Index index(GetPage(myRoot)->GetFirstIndex()); 119: index.SetPointer(myRoot); 120: // make the new page & insert the index 121: int PageNumber = NewPage(index,FALSE); 122: 123: Page* pg = GetPage(PageNumber); 124: 125: //get a pointer to the new (sibling) page 126: Index Sib(GetPage(retVal)->GetFirstIndex()); 127: Sib.SetPointer(retVal); 128: // put it into the page 129: pg->InsertLeaf(Sib); 130: 131: // reset myRoot to point to the new top 132: myRoot = PageNumber; 133: } 134: } 135: } 136: 137: void BTree::PrintTree() 138: { 139: GetPage(myRoot)->Print(); 140: 141: cout << "\n\nStats:" << endl; 142: cout << "Node pages: " << NodePageCtr << endl; 143: cout << "Leaf pages: " << LeafPageCtr << endl; 144: cout << "Node indexes: " << NodeIndexCtr << endl; 145: cout << "Leaf indexes: " << LeafIndexCtr << endl; 146: for (int i = 0; i < Order +2; i++) 147: { 148: if (NodeIndexPerPage[i]) 149: { 150: cout << "Pages with " << i << " nodes: "; 151: cout << NodeIndexPerPage[i] << endl; 152: } 153: if (LeafIndexPerPage[i]) 154: { 155: cout << "Pages with " << i << " leaves: "; 156: cout << LeafIndexPerPage[i] << endl; 157: } 158: } 159: 160: } 161: 162: BOOL BTree::GetWord(char* string, char* word, int& wordOffset) 163: { 164: 165: if (!string[wordOffset]) 166: return FALSE; 167: 168: char *p1, *p2; 169: p1 = p2 = string+wordOffset; 170: 171: // eat leading spaces 172: for (int i = 0; i<(int)strlen(p1) && !BTree::myAlNum(p1[0]); i++) 173: p1++; 174: 175: // see if you have a word 176: if (!BTree::myAlNum(p1[0])) 177: return FALSE; 178: 179: p2 = p1; // point to start of word 180: 181: // march p2 to end of word 182: while (BTree::myAlNum(p2[0])) 183: p2++; 184: 185: int len = int(p2-p1); 186: #if defined(_MSVC_16BIT) || defined(_MSVC_32BIT) : { : len = __min(len,(int)PageSize); : } : #else : { : len = min(len,(int)PageSize); : } : #endif 187: 188: strncpy (word,p1,len); 189: word[len]='\0'; 190: 191: for (i = int (p2-string); i<(int)strlen(string) && !BTree::myAlNum(p2[0]); i++) 192: p2++; 193: 194: wordOffset = int (p2-string); 195: 196: return TRUE; 197: } 198: 199: int BTree::myAlNum(char ch) 200: { 201: return isalnum(ch) || 202: ch == '-' || 203: ch == '\'' || 204: ch == '(' || 205: ch == ')'; 206: }
Listing 16.3 Index.cpp
1: // ************************************************** 2: // PROGRAM: index 3: // FILE: index.cpp 4: // PURPOSE: provide fundamental btree functionality 5: // NOTES: 6: // AUTHOR: Jesse Liberty (jl) 7: // REVISIONS: 11/1/94 1.0 jl initial release 8: // ************************************************** 9: 10: #include "btree.hpp" 11: #include <ctype.h> 12: 13: Index::Index(char* str):myPointer(1) 14: { 15: strncpy(myData,str,dataLen); 16: myData[dataLen]='\0'; 17: for (int i = 0; i< (int)strlen(myData); i++) 18: myData[i] = toupper(myData[i]); 19: } 20: 21: Index::Index(char* str, int ptr):myPointer(ptr) 22: { 23: 24: strncpy(myData,str,dataLen); 25: myData[dataLen]='\0'; 26: for (int i = 0; i< (int)strlen(myData); i++) 27: myData[i] = toupper(myData[i]); 28: 29: } 30: 31: Index::Index(Index& rhs): 32: myPointer(rhs.GetPointer()) 33: { 34: strcpy(myData, rhs.GetData()); 35: for (int i = 0; i< (int)strlen(myData); i++) 36: myData[i] = toupper(myData[i]); 37: 38: } 39: 40: Index::Index():myPointer(0) 41: { 42: myData[0]='\0'; 43: } 44: 45: void Index::PrintKey() 46: { 47: cout << " " << myData; 48: } 49: 50: void Index::PrintPage() 51: { 52: cout << "\n" << myData << ": " ; 53: BTree::theIDXManager.GetPage(myPointer)->Print(); 54: } 55: 56: int Index::Insert(Index& ref, BOOL findOnly, BOOL findExact) 57: { 58: return BTree::theIDXManager.GetPage(myPointer)->Insert(ref,findOnly,findExact); 59: } 60: 61: int Index::operator==(const Index& rhs) 62: { 63: return (strcmp(myData,rhs.GetData()) == 0); // case insensitive 64: } 65: 66: int Index::Begins(const Index& rhs) const 67: { 68: int len = strlen(myData); 69: if (!len) 70: return TRUE; 71: return (strncmp(myData,rhs.GetData(),len) == 0); 72: }
Listing 16.4 Page.cpp
1: // ************************************************** 2: // PROGRAM: Page 3: // FILE: Page.cpp 4: // PURPOSE: provide fundamental btree functionality 5: // NOTES: 6: // AUTHOR: Jesse Liberty (jl) 7: // REVISIONS: 11/1/94 1.0 jl initial release 8: // ************************************************** 9: 10: #include "btree.hpp" 11: #include <assert.h> 12: 13: // constructors 14: 15: // default constructor 16: Page::Page() 17: { 18: } 19: 20: // create a page from a buffer read in from disk 21: Page::Page(char *buffer): 22: myKeys((Index*)myVars.mk) 23: { 24: assert(sizeof(myBlock) == PageSize); 25: assert(sizeof(myVars) == PageSize); 26: memcpy(&myBlock,buffer,PageSize); 27: SetLocked(FALSE); 28: myTime = time(NULL); 29: } 30: 31: // create a Page from the first index 32: Page::Page(Index& index, BOOL bLeaf): 33: myKeys((Index*)myVars.mk) 34: { 35: 36: myVars.myCount=1; 37: myVars.IsLeaf = bLeaf; 38: SetLocked(FALSE); 39: // if this is a leaf, this is the first 40: // index on the first page, set its pointer 41: // based on creating a new wnj. otherwise 42: // you are here creating a new node, do not 43: // set the pointer, it is already set. 44: if (bLeaf) 45: index.SetPointer(BTree::theWNJFile.Append(index.GetPointer())); 46: myKeys[0]=index; 47: myVars.myPageNumber = gPage++; 48: myTime = time(NULL); 49: } 50: 51: // create a page by splitting another page 52: Page::Page(Index *array, int offset, BOOL bLeaf,int count): 53: myKeys((Index*)myVars.mk) 54: { 55: myVars.IsLeaf = bLeaf; 56: myVars.myCount = 0; 57: for (int i=0, j = offset; j<Order && i < count; i++, j++) 58: { 59: myKeys[i]= array[j]; 60: myVars.myCount++; 61: } 62: myVars.myPageNumber = gPage++; 63: SetLocked(FALSE); 64: myTime = time(NULL); 65: } 66: 67: void Page::Nullify(int offset) 68: { 69: for (int i = offset; i<Order; i++) 70: { 71: myKeys[i].SetPointer(0); 72: myVars.myCount--; 73: } 74: } 75: 76: // decide whether I'm a leaf or a node 77: // and pass this index to the right 78: // function. If findOnly is true, don't insert 79: // just return the page number (for now) 80: int Page::Insert(Index& rIndex, BOOL findOnly, BOOL synonym) 81: { 82: int result; 83: if (myVars.IsLeaf) 84: { 85: SetLocked(TRUE); 86: result = FindLeaf(rIndex,findOnly,synonym); 87: SetLocked(FALSE); 88: return result; 89: } 90: else 91: { 92: SetLocked(TRUE); 93: result = InsertNode(rIndex,findOnly,synonym); 94: SetLocked(FALSE); 95: return result; 96: } 97: } 98: 99: 100: // find the right page for this index 101: int Page::InsertNode(Index& rIndex, BOOL findOnly,BOOL synonym) 102: { 103: int retVal =0; 104: BOOL inserted = FALSE; 105: int i; 106: 107: assert(myVars.myCount>0); // nodes have at least 1 108: assert(myKeys[0].GetPointer()); // must be valid 109: 110: if (findOnly) 111: { 112: for (i = 0; i< myVars.myCount; i++) 113: { 114: if (!synonym) 115: { 116: if (rIndex.Begins(myKeys[i])) 117: { 118: BTree::theIter.RecordNode(myVars.myPageNumber,i); 119: return myKeys[i].Insert(rIndex,findOnly,synonym); 120: } 121: } 122: else 123: { 124: if (rIndex == (myKeys[i])) 125: { 126: BTree::theIter.RecordNode(myVars.myPageNumber,i); 127: return myKeys[i].Insert(rIndex,findOnly,synonym); 128: } 129: 130: 131: } 132: } 133: } 134: 135: 136: // does it go before my first entry? 137: if (!inserted && rIndex < myKeys[0]) 138: { 139: 140: if (findOnly) 141: return 0L; 142: 143: myKeys[0].SetData(rIndex); 144: BTree::theIter.RecordNode(myVars.myPageNumber,0); 145: retVal=myKeys[0].Insert(rIndex,findOnly,synonym); 146: inserted = TRUE; 147: } 148: 149: // find insertion point 150: if (!inserted) 151: { 152: for (i = myVars.myCount-1; i>=0; i--) 153: { 154: assert(myKeys[i].GetPointer()); 155: if ( (rIndex >= myKeys[i])) 156: { 157: BTree::theIter.RecordNode(myVars.myPageNumber,i); 158: retVal=myKeys[i].Insert(rIndex,findOnly,synonym); 159: inserted = TRUE; 160: break; 161: } 162: } 163: } 164: assert(inserted); // change to exception if not! 165: 166: // if you had to split 167: if (retVal && !findOnly) // got back a pointer to a new page 168: { 169: Index * pIndex = new Index(GetPage(retVal)->GetFirstIndex()); 170: pIndex->SetPointer(retVal); 171: retVal = InsertLeaf(*pIndex); 172: } 173: return retVal; 174: } 175: 176: // called if current page is a leaf 177: int Page::FindLeaf(Index& rIndex, BOOL findOnly, BOOL synonym) 178: { 179: int result = 0; 180: 181: // no duplicates! 182: for (int i=0; i < myVars.myCount; i++) 183: { 184: if (findOnly) 185: { 186: if (!synonym) 187: { 188: if (rIndex.Begins(myKeys[i])) 189: { 190: BTree::theIter.RecordLeaf(myVars.myPageNumber,i); 191: return myKeys[i].GetPointer(); 192: } 193: } 194: else 195: { 196: if (rIndex ==(myKeys[i])) 197: { 198: BTree::theIter.RecordLeaf(myVars.myPageNumber,i); 199: return myKeys[i].GetPointer(); 200: } 201: 202: 203: } 204: } 205: else 206: if (rIndex == myKeys[i]) 207: return BTree::theWNJFile.Insert( 208: rIndex.GetPointer(), 209: myKeys[i].GetPointer()); 210: } 211: 212: if (findOnly) // not found 213: return result; 214: 215: // this index item does not yet exist 216: // before you push it into the index 217: // push an entry into the wnj.idx 218: // and set the index to point to that entry 219: if (!synonym) 220: rIndex.SetPointer(BTree::theWNJFile.Append(rIndex.GetPointer())); // new! 221: return InsertLeaf(rIndex); 222: } 223: 224: int Page::InsertLeaf(Index& rIndex) 225: { 226: int result = 0; 227: if (myVars.myCount < Order) 228: Push(rIndex); 229: else // overflow the page 230: { 231: // make sibling 232: int NewPg = 233: NewPage(myKeys,Order/2,myVars.IsLeaf,myVars.myCount); 234: Page* Sibling = GetPage(NewPg); 235: Nullify(Order/2); // nullify my right half 236: 237: // does it fit in this side? 238: if (myVars.myCount>Order/2-1 && rIndex <= myKeys[Order/2-1]) 239: Push(rIndex); 240: else // push it into the new sibling 241: Sibling->Push(rIndex); 242: 243: result = NewPg; // we split, pass it up 244: } 245: return result; 246: } 247: 248: // put the new index into this page (in order) 249: void Page::Push(Index& rIndex,int offset,BOOL first) 250: { 251: BOOL inserted = FALSE; 252: assert(myVars.myCount < Order); 253: for (int i=offset; i<Order && i<myVars.myCount; i++) 254: { 255: assert(myKeys[i].GetPointer()); 256: if (rIndex <= myKeys[i]) 257: { 258: Push(myKeys[i],offset+1,FALSE); 259: myKeys[i]=rIndex; 260: inserted = TRUE; 261: break; 262: } 263: } 264: if (!inserted) 265: myKeys[myVars.myCount] = rIndex; 266: 267: if (first) 268: myVars.myCount++; 269: } 270: 271: 272: void Page::Print() 273: { 274: if (!myVars.IsLeaf) 275: { 276: BTree::NodePageCtr++; 277: BTree::NodeIndexPerPage[myVars.myCount]++; 278: BTree::NodeIndexCtr+=myVars.myCount; 279: } 280: else 281: { 282: BTree::LeafPageCtr++; 283: BTree::LeafIndexPerPage[myVars.myCount]++; 284: BTree::LeafIndexCtr+=myVars.myCount; 285: } 286: 287: for (int i = 0; i<Order && i < myVars.myCount; i++) 288: { 289: assert(myKeys[i].GetPointer()); 290: if (myVars.IsLeaf) 291: myKeys[i].PrintKey(); 292: else 293: myKeys[i].PrintPage(); 294: } 295: } 296: 297: // write out the entire page as a block 298: fstream& Page::Write(fstream& file) 299: { 300: char buffer[PageSize]; 301: memcpy(buffer,&myBlock,PageSize); 302: file.flush(); 303: file.clear(); 304: file.seekp(myVars.myPageNumber*PageSize); 305: file.write(buffer,PageSize); 306: return file; 307: }
Listing 16.5 Datafile.cpp
1: // ************************************************** 2: // PROGRAM: Data file 3: // FILE: datafile.cpp 4: // PURPOSE: 5: // NOTES: 6: // AUTHOR: Jesse Liberty (jl) 7: // REVISIONS: 11/5/94 1.0 jl initial release 8: // ************************************************** 9: 10: #include "btree.hpp" 11: #include <assert.h> 12: 13: // on construction, try to open the file if it exists 14: DataFile::DataFile(): 15: myFile("ROBIN.DAT", 16: ios::binary | ios::in | ios::out | ios::nocreate | ios::app) 17: { 18: } 19: 20: void DataFile::Create() 21: { 22: if (!myFile) // nocreate failed, first creation 23: { 24: 25: // open the file, create it this time 26: myFile.clear(); 27: myFile.open 28: ("ROBIN.DAT",ios::binary | ios::in | ios::out | ios::app); 29: 30: char Header[offInt]; 31: int MagicNumber = 1234; // a number we can check for 32: memcpy(Header,&MagicNumber,offInt); 33: myFile.clear(); 34: myFile.flush(); 35: myFile.seekp(0); 36: myFile.write(Header,offInt); 37: return; 38: } 39: 40: // we did open the file, it already existed 41: // get the numbers we need 42: int MagicNumber; 43: myFile.seekg(0); 44: myFile.read((char *) &MagicNumber,offInt); 45: 46: // check the magic number. If it is wrong the file is 47: // corrupt or this isn't the index file 48: if (MagicNumber != 1234) 49: { 50: // change to an exception!! 51: cout << "DataFile Magic number failed!"; 52: } 53: return; 54: } 55: 56: // write out the numbers we'll need next time 57: void DataFile::Close() 58: { 59: myFile.close(); 60: } 61: 62: 63: int DataFile::Insert(char * newNote) 64: { 65: int len = strlen(newNote); 66: 67: 68: time_t theTime; 69: theTime = time(NULL); 70: 71: int offTime = sizeof(time_t); 72: int fullLen = len + offInt + offTime; 73: 74: char buffer[PageSize]; 75: memcpy(buffer,&len,offInt); 76: memcpy(buffer+sizeof(len),&theTime,offTime); 77: memcpy(buffer+offInt+offTime,newNote,len); 78: 79: myFile.clear(); 80: myFile.flush(); 81: myFile.seekp(0,ios::end); 82: int offset = (int) myFile.tellp(); 83: myFile.write(buffer,fullLen); 84: myFile.flush(); 85: return offset; 86: } 87: 88: void DataFile::GetRecord(int offset, char* buffer, int& len, time_t& theTime) 89: { 90: 91: int offLen = sizeof(int); 92: int offTime = sizeof(time_t); 93: char tmpBuff[PageSize]; 94: myFile.flush(); 95: myFile.clear(); 96: myFile.seekg(offset); 97: myFile.read(tmpBuff,PageSize); 98: memcpy(&len,tmpBuff,offLen); 99: memcpy(&theTime,tmpBuff+offLen,offTime); 100: strncpy(buffer,tmpBuff+offLen+offTime,len); 101: buffer[len] = '\0'; 102: }
Listing 16.6 IdxManager.cpp
1: // ************************************************** 2: // PROGRAM: Idxmr 3: // FILE: Idxmgr.cpp 4: // PURPOSE: provide i/o for btree 5: // NOTES: 6: // AUTHOR: Jesse Liberty (jl) 7: // REVISIONS: 11/5/94 1.0 jl initial release 8: // ************************************************** 9: 10: #include "btree.hpp" 11: #include <assert.h> 12: 13: // on construction, try to open the file if it exists 14: IDXManager::IDXManager(): 15: myFile("ROBIN.IDX",ios::binary | ios::in | ios::out | ios::nocreate) 16: { 17: // initialize the pointers to null 18: for (int i = 0; i< MaxPages; i++) 19: myPages[i] = 0; 20: myCount = 0; 21: 22: } 23: 24: // called by btree constructor 25: // if we opened the file, read in the numbers we need 26: // otherwise create the file 27: int IDXManager::Create() 28: { 29: if (!myFile) // nocreate failed, first creation 30: { 31: // open the file, create it this time 32: myFile.open("ROBIN.IDX",ios::binary | ios::in | ios::out); 33: char Header[PageSize]; 34: int MagicNumber = 1234; // a number we can check for 35: memcpy(Header,&MagicNumber,offInt); 36: int NextPage = 1; 37: memcpy(Header+offInt,&NextPage,offInt); 38: memcpy(Header+2*offInt,&NextPage,offInt); 39: Page::SetgPageNumber(NextPage); 40: char title[]="ROBIN.IDX. Ver 1.00"; 41: memcpy(Header+3*offInt,title,strlen(title)); 42: myFile.flush(); 43: myFile.clear(); 44: myFile.seekp(0); 45: myFile.write(Header,PageSize); 46: return 0; 47: } 48: 49: // we did open the file, it already existed 50: // get the numbers we need 51: int MagicNumber; 52: myFile.seekg(0); 53: myFile.read((char *) &MagicNumber,offInt); 54: 55: // check the magic number. If it is wrong the file is 56: // corrupt or this isn't the index file 57: if (MagicNumber != 1234) 58: { 59: // change to an exception!! 60: cout << "Index Magic number failed!"; 61: return 0; 62: } 63: 64: int NextPage; 65: myFile.seekg(offInt); 66: myFile.read((char*) &NextPage,offInt); 67: Page::SetgPageNumber(NextPage); 68: int FirstPage; 69: myFile.seekg(2*offInt); 70: myFile.read((char*) &FirstPage,offInt); 71: const int room = PageSize - 3*offInt; 72: char buffer[room]; 73: myFile.read(buffer,room); 74: buffer[20]='\0'; 75: // cout << buffer << endl; 76: // read in all the pages 77: for (int i = 1; i < NextPage; i++) 78: { 79: myFile.seekg(i * PageSize); 80: char buffer[PageSize]; 81: myFile.read( buffer, PageSize); 82: Page * pg = new Page(buffer); 83: Insert(pg); 84: } 85: 86: return FirstPage; 87: } 88: 89: // write out the numbers we'll need next time 90: void IDXManager::Close(int theRoot) 91: { 92: 93: for (int i = 0; i< MaxPages; i++) 94: if (myPages[i]) 95: Save(myPages[i]); 96: int NextPage = Page::GetgPageNumber(); 97: if (!myFile) 98: cout << "Error opening myFile!" << endl; 99: myFile.flush(); 100: myFile.clear(); 101: myFile.seekp(offInt); 102: myFile.write ( (char *) &NextPage,offInt); 103: myFile.seekp(2*offInt); 104: myFile.write((char*) &theRoot,offInt); 105: myFile.close(); 106: } 107: 108: // wrapper function 109: int IDXManager::NewPage(Index& index, BOOL bLeaf) 110: { 111: Page * newPage = new Page(index, bLeaf); 112: Insert(newPage); 113: Save(newPage); 114: return newPage->GetPageNumber(); 115: } 116: 117: int IDXManager::NewPage( 118: Index *array, 119: int offset, 120: BOOL leaf, 121: int count) 122: { 123: Page * newPage = new Page(array, offset, leaf,count); 124: Insert(newPage); 125: Save(newPage); 126: return newPage->GetPageNumber(); 127: } 128: 129: void IDXManager::Insert(Page * newPage) 130: { 131: 132: // add new page into array of page managers 133: if (myCount < MaxPages) 134: { 135: assert(!myPages[myCount]); 136: myPages[myCount++] = newPage; 137: } 138: else // no room, time to page out to disk 139: { 140: int lowest = -1; 141: 142: for (int i = 0; i< MaxPages; i++) 143: { 144: if (myPages[i]->GetLocked() == FALSE ) 145: lowest = i; 146: } 147: if (lowest == -1) 148: assert(lowest != -1); // change to exception if -1 (no page to kill) 149: 150: for (i = 0; i< MaxPages; i++) 151: if (myPages[i]->GetTime() < myPages[lowest]->GetTime() && myPages[i]->GetLocked() == FALSE) 152: lowest = i; 153: 154: assert(myPages[lowest]); 155: Save(myPages[lowest]); 156: delete myPages[lowest]; 157: myPages[lowest] = newPage; 158: 159: } 160: } 161: 162: // tell the page to write itself out 163: void IDXManager::Save(Page* pg) 164: { 165: pg->Write(myFile); 166: } 167: 168: // see if the page is in memory, if so return it 169: // otherwise get it from disk 170: // note: this won't scale, with lots of page managers 171: // you'd need a more efficient search. 10 levels of page 172: // managers, with 31 indexes per page gives you room for 173: // 800 trillion words. Even if each page is only 1/2 full 174: // on average, 10 levels of depth would represent 64 million 175: // keys alone, not to mention the actual records. 176: 177: Page * IDXManager::GetPage(int target) 178: { 179: 180: for (int i = 0; i< MaxPages; i++) 181: { 182: if (myPages[i]->GetPageNumber() == target) 183: return myPages[i]; 184: } 185: myFile.seekg(target * PageSize); 186: char buffer[PageSize]; 187: myFile.read( buffer, PageSize); 188: Page * pg = new Page(buffer); 189: Insert(pg); 190: return pg; 191: }
Listing 16.7 Iter.cpp
1: // ************************************************** 2: // PROGRAM: Iterator 3: // FILE: iter.cpp 4: // PURPOSE: iterator for BTree 5: // NOTES: 6: // AUTHOR: Jesse Liberty (jl) 7: // REVISIONS: 12/1/94 1.0 jl initial release 8: // ************************************************** 9: #include "btree.hpp" 10: 11: // history class implementations 12: History::History(int pno, int off, BOOL node): 13: PageNo(pno), 14: OffSet(off), 15: IsNode(node) 16: {} 17: 18: // Iterator implementations 19: 20: // start off with blank iterator 21: Iterator::Iterator() 22: { 23: Reset(); 24: } 25: 26: Iterator::~Iterator() 27: { 28: for (int i = 0; i<MaxPages; i++) 29: delete myHistory[i]; 30: } 31: 32: void Iterator::RecordNode(int PageNo, int OffSet) 33: { 34: myHistory[myNext++] = new History(PageNo,OffSet,TRUE); 35: } 36: 37: void Iterator::RecordLeaf(int PageNo, int OffSet) 38: { 39: myHistory[myNext++] = new History(PageNo,OffSet,FALSE); 40: } 41: 42: History* Iterator::GetLast() 43: { 44: return myHistory[myNext-1]; 45: } 46: 47: History* Iterator::GetFirst() 48: { 49: return myHistory[0]; 50: } 51: 52: void Iterator::Reset() 53: { 54: for (int i = 0; i<MaxPages; i++) 55: myHistory[i] = 0; 56: myNext = 0; 57: } 58: 59: 60: int Iterator::GetNext(const Index& theIndex) 61: { 62: 63: for (;;) 64: { 65: History * pHist = GetLast(); 66: int pgNo = pHist-> PageNo; 67: int newOffSet = pHist->OffSet+1; 68: Page * pg = BTree::theIDXManager.GetPage(pgNo); 69: 70: if (newOffSet < pg->GetCount()) 71: { 72: Index Idx = pg->GetIndex(newOffSet); 73: pHist->OffSet = newOffSet; 74: for (;;) 75: { 76: if (pg->GetIsLeaf()) 77: { 78: // cout << "Key: " << Idx.GetData() << endl; 79: if (theIndex.Begins(Idx)) 80: return Idx.GetPointer(); 81: else 82: return 0; 83: } 84: else 85: { 86: pg = BTree::theIDXManager.GetPage(Idx.GetPointer()); 87: Idx = pg->GetFirstIndex(); 88: pHist = new History(pg->GetPageNumber(),0, (BOOL)!pg->GetIsLeaf()); 89: myHistory[myNext++] = pHist; 90: } 91: } // end inner for loop 92: } 93: else 94: { 95: delete myHistory[myNext-1]; 96: myHistory[myNext-1] = 0; 97: myNext--; 98: if (!myNext) 99: return 0; 100: } 101: } // end outer for loop 102: }
Listing 16.8 WNJFile.cpp
1: // ************************************************** 2: // PROGRAM: Word-Node-Join 3: // FILE: WNJFile.cpp 4: // PURPOSE: 5: // NOTES: 6: // AUTHOR: Jesse Liberty (jl) 7: // REVISIONS: 11/5/94 1.0 jl initial release 8: // ************************************************** 9: 10: #include "btree.hpp" 11: #include <assert.h> 12: 13: // on construction, try to open the file if it exists 14: WNJFile::WNJFile(): 15: myFile("ROBINWNJ.IDX", 16: ios::binary | ios::in | ios::out | ios::nocreate) 17: { 18: for (int i = 0; i<5; i++) 19: myints[i]=0L; 20: } 21: 22: void WNJFile::Create() 23: { 24: 25: char Header[2*offInt]; 26: int MagicNumber=0; // a number we can check for 27: int zero = 0; 28: 29: if (!myFile) // nocreate failed, first creation 30: { 31: // open the file, create it this time 32: myFile.clear(); 33: myFile.open("ROBINWNJ.IDX", 34: ios::binary | ios::in | ios::out); 35: 36: MagicNumber = 1234; 37: memcpy(Header,&MagicNumber,offInt); 38: memcpy(Header+offInt,&zero,offInt); 39: myFile.seekp(0); 40: myFile.write(Header,2*offInt); 41: myFile.flush(); 42: return; 43: } 44: 45: // we did open the file, it already existed 46: // get the numbers we need 47: 48: 49: myFile.seekg(0); 50: myFile.read(Header,2*offInt); 51: memcpy(&MagicNumber,Header,offInt); 52: memcpy(&myCount,Header+offInt,offInt); 53: 54: // check the magic number. If it is wrong the file is 55: // corrupt or this isn't the index file 56: if (MagicNumber != 1234) 57: { 58: // change to an exception!! 59: cout << "WNJ Magic number failed!"; 60: cout << "Magic number: " << MagicNumber; 61: cout << "\nmyCount: " << myCount << endl; 62: } 63: return; 64: } 65: 66: // write out the numbers we'll need next time 67: void WNJFile::Close() 68: { 69: 70: myFile.seekg(offInt); 71: myFile.write((char*)&myCount,offInt); 72: myFile.close(); 73: } 74: 75: int WNJFile::Append(int DataOffset) 76: { 77: 78: int newPos = 2*offInt + myCount++ * (5*offInt); 79: int offsets[5]; 80: offsets[0] = DataOffset; 81: for (int i = 1; i<5; i++) 82: offsets[i]=0; 83: myFile.seekg(newPos); 84: myFile.write((char*)offsets,5*offInt); 85: 86: return newPos; 87: } 88: 89: 90: int WNJFile::Insert(int DataOffset,int WNJOffset) 91: { 92: 93: int ints[5]; 94: myFile.seekg(WNJOffset); 95: myFile.read((char*)ints,5*offInt); 96: 97: int offset=WNJOffset; 98: 99: while (ints[4]) 100: { 101: offset = ints[4]; 102: myFile.clear(); 103: myFile.flush(); 104: myFile.seekg(ints[4]); 105: myFile.read((char*)ints,5*offInt); 106: } 107: if (ints[3]) // full! 108: { 109: ints[4] = Append(DataOffset); 110: myFile.clear(); 111: myFile.flush(); 112: myFile.seekg(offset); 113: myFile.write((char*)ints,5*offInt); 114: } 115: else 116: { 117: for (int i = 0; i<4; i++) 118: { 119: if (ints[i] == 0) 120: { 121: ints[i] = DataOffset; 122: myFile.clear(); 123: myFile.flush(); 124: myFile.seekg(offset); 125: myFile.write((char*)ints,5*offInt); 126: break; 127: } 128: } 129: } 130: return 0; 131: } 132: 133: 134: int* WNJFile::Find(int NextWNJ) 135: { 136: int ints[5]; 137: 138: int * results = new int[256]; 139: 140: int i = 0, j=0; 141: 142: while (j<256) 143: results[j++] = 0; 144: 145: j = 0; 146: 147: myFile.seekg(NextWNJ); 148: myFile.read((char*)ints,5*offInt); 149: 150: while (j < 256) 151: { 152: if (ints[i]) 153: { 154: if (i == 4) 155: { 156: myFile.seekg(ints[4]); 157: myFile.read((char*)ints,5*offInt); 158: i = 0; 159: continue; 160: } 161: results[j++] = ints[i++]; 162: } 163: else 164: break; 165: } 166: return results; 167: }
Listing 16.9 R2.cpp
1: // ************************************************** 2: // PROGRAM: R2 3: // FILE: r2.cpp 4: // PURPOSE: Add synonyms to ROBIN. 5: // NOTES: 6: // AUTHOR: Jesse Liberty (jl) 7: // REVISIONS: 1/1/95 1.0 jl 8: // ************************************************** 9: 10: 11: #include "stdef.hpp" 12: #include "btree.hpp" 13: #include <stdlib.h> 14: 15: // static definitions 16: IDXManager BTree::theIDXManager; 17: DataFile BTree::theDataFile; 18: WNJFile BTree::theWNJFile; 19: Iterator BTree::theIter; 20: 21: int WNJFile::myCount=0L; 22: int Page::gPage=1; 23: int BTree::NodeIndexCtr=0; 24: int BTree::LeafIndexCtr=0; 25: int BTree::NodePageCtr=0; 26: int BTree::LeafPageCtr=0; 27: int BTree::NodeIndexPerPage[Order+1]; 28: int BTree::LeafIndexPerPage[Order+1]; 29: 30: 31: // prototypes 32: void parseCommandLines(char *buffer,int argc,char **argv); 33: void ShowMenu(long*); 34: void DoFind(int, char**, BTree&); 35: void ParseFile(int, char**, BTree&); 36: void DoSyn(char *orig, char* syn, BTree& myTree); 37: 38: // driver program 39: int main(int argc, char ** argv) 40: { 41: BTree myTree; 42: 43: for (int i = 0; i < Order +2; i++) 44: { 45: BTree::NodeIndexPerPage[i] =0; 46: BTree::LeafIndexPerPage[i] = 0; 47: } 48: 49: char buffer[PageSize+1]; 50: 51: if (argc == 1) 52: { 53: cout << "Please provide command line arguments"; 54: return 1; 55: } 56: 57: // check for flags, if none add text to data file 58: if (argv[1][0] == '-') 59: { 60: 61: switch (argv[1][1]) 62: { 63: case '?': 64: DoFind(argc,argv,myTree); 65: break; 66: 67: case '!': 68: myTree.PrintTree(); 69: break; 70: 71: case 'F': 72: case 'f': 73: ParseFile(argc,argv,myTree); 74: break; 75: 76: case 'S': 77: case 's': 78: DoSyn(argv[2],argv[3],myTree); 79: break; 80: } 81: } 82: else 83: { 84: parseCommandLines(buffer,argc,argv); 85: myTree.Insert(buffer,argc,argv); 86: cout << "Inserted.\n"; 87: } 88: return 0; 89: } 90: 91: // concatenate remaining command line arguments 92: void parseCommandLines(char *buffer,int argc,char **argv) 93: { 94: size_t len = 0; 95: size_t argLen=0; 96: for (int i = 1; i< argc; i++) 97: { 98: argLen = strlen(argv[i]); 99: if (len + argLen +2 < PageSize) 100: { 101: strncpy(buffer+len,argv[i],argLen); 102: strncpy(buffer+len+argLen," ",1); 103: len += argLen+1; 104: } 105: } 106: buffer[len] = '\0'; 107: } 108: 109: // having found matches, show the menu of choices 110: // each entry is numbered and dated 111: void ShowMenu(int *list) 112: { 113: int j=0; 114: char buffer[PageSize+1]; 115: time_t theTime; 116: int len; 117: char listBuff[256]; 118: struct tm * ts; 119: int dispSize; 120: 121: while (list[j] && j < 20) 122: { 123: BTree::theDataFile.GetRecord(list[j],buffer,len, theTime); 124: #if defined(_MSVC_16BIT) || defined(_MSVC_32BIT) : { : dispSize = __min(len,32); // THIS is a DOUBLE UNDERSCORE : } : #else : { : dispSize = min(len,32); : } : #endif 125: strncpy(listBuff,buffer,dispSize); 126: if (dispSize == 32) 127: { 128: listBuff[29] = '.'; 129: listBuff[30] = '.'; 130: listBuff[31] = '.'; 131: } 132: listBuff[dispSize]='\0'; 133: ts = localtime(&theTime); 134: cout << "[" << (j+1) << "] "; 135: cout << ts->tm_mon << "/"; 136: cout << ts->tm_mday << "/"; 137: cout << ts->tm_year << " "; 138: cout << listBuff << endl; 139: j++; 140: } 141: } 142: 143: // handle -? command 144: // find matches, show the menu, request choice 145: // display record and redisplay menu 146: void DoFind(int argc, char ** argv, BTree& myTree) 147: { 148: 149: // create an array of total set of WNJ 150: // offsets. This will be used to display 151: // choices and to find actual text 152: int list[PageSize]; 153: 154: // initialize the array to all zeros 155: for (int i = 0; i<PageSize; i++) 156: list[i] = 0; 157: 158: // for each word in the command line 159: // search for the matching set of records 160: for (i = 2; i< argc; i++) 161: { 162: int offset = myTree.Find(argv[i]); 163: while (offset) 164: { 165: int* found = BTree::theWNJFile.Find(offset); 166: int j=0; 167: int len; 168: time_t theTime; 169: char buff[512]; 170: while (found[j]) 171: { 172: BTree::theDataFile.GetRecord(found[j],buff,len, theTime); 173: struct tm * ts = localtime(&theTime); 174: cout << "Found: "; 175: cout << ts->tm_mon << "/"; 176: cout << ts->tm_mday << "/"; 177: cout << ts->tm_year << " "; 178: cout << buff << endl; 179: j++; 180: } 181: delete [] found; 182: offset = myTree.GetNext(argv[i]); 183: } 184: } 185: 186: cout << "\n"; 187: 188: if (!list[0]) 189: return; 190: 191: 192: ShowMenu(list); 193: 194: int choice; 195: char buffer[PageSize]; 196: int len; 197: time_t theTime; 198: 199: for (;;) 200: { 201: cout << "Choice (0 to stop): " ; 202: cin >> choice; 203: if (!choice) 204: break; 205: BTree::theDataFile.GetRecord(list[choice-1],buffer,len, theTime); 206: cout << "\n>> "; 207: cout << buffer; 208: cout << "\n\n"; 209: ShowMenu(list); 210: } 211: } 212: 213: // open a file and create a new note for each line 214: // index every word in the line 215: void ParseFile(int argc,char **argv, BTree& myTree) 216: { 217: 218: char buffer[PageSize]; 219: char theString[PageSize]; 220: 221: ifstream theFile(argv[2],ios::in ); 222: if (!theFile) 223: { 224: cout << "Error opening input file!\n"; 225: return; 226: } 227: int offset = 0; 228: for (;;) 229: { 230: theFile.read(theString,PageSize); 231: int len = theFile.gcount(); 232: if (!len) 233: break; 234: theString[len]='\0'; 235: char *p1, *p2, *p0; 236: p0 = p1 = p2 = theString; 237: 238: while (p1[0] && (p1[0] == '\n' || p1[0] == '\r')) 239: p1++; 240: 241: p2 = p1; 242: 243: while (p2[0] && p2[0] != '\n' && p2[0] != '\r') 244: p2++; 245: 246: int bufferLen = int(p2 - p1); 247: int totalLen = int (p2 - p0); 248: 249: if (!bufferLen) 250: continue; 251: 252: // lstrcpyn(buffer,p1,bufferLen); 253: strncpy(buffer,p1,bufferLen); 254: buffer[bufferLen]='\0'; 255: 256: // for (int i = 0; i< PageSize; i++) 257: cout << "\r"; 258: cout << "Parsing " << buffer; 259: myTree.Insert(buffer); 260: offset += totalLen; 261: theFile.clear(); 262: theFile.seekg(offset,ios::beg); 263: } 264: } 265: 266: 267: // add synonyms to the tree 268: void DoSyn(char *orig, char* syn, BTree& myTree) 269: { 270: int offset = myTree.FindExact(syn); 271: if (!offset) // syn can't exist 272: { 273: int offset = myTree.FindExact(orig); 274: if (offset) // orig must exist 275: myTree.AddKey(syn,offset,TRUE); 276: } 277: }
Output:
Note: Numbering has been added to the output to make analysis easier. Your output will not include this numbering.
1: d:\>r2 now is the time for all good men to 2: Inserted. 3: 4: d:\>r2 come to the aid of their party 5: Inserted. 6: 7: d:\>r2 what is the meaning of this 8: Inserted. 9: 10: d:\>r2 eternal vigilance is the price of liberty 11: Inserted. 12: 13: d:\>r2 when in Rome, do as the Romans 14: Inserted. 15: 16: d:\>r2 -! 17: 18: AID: 19: AID: AID ALL 20: COME: COME ETERNAL FOR 21: GOOD: GOOD LIBERTY MEANING MEN 22: NOW: 23: NOW: NOW PARTY 24: PRICE: PRICE ROMANS ROME 25: THE: THE THEIR 26: THIS: 27: THIS: THIS TIME 28: VIGILANCE: VIGILANCE WHAT WHEN 29: 30: Stats: 31: Node pages: 4 32: Leaf pages: 8 33: Node indexes: 11 34: Leaf indexes: 21 35: Pages with 2 nodes: 1 36: Pages with 2 leaves: 4 37: Pages with 3 nodes: 3 38: Pages with 3 leaves: 3 39: Pages with 4 leaves: 1 40: 41: d:\>r2 -? liberty 42: Found: 0/16/95 eternal vigilance is the price of liberty 43: 44: 45: d:\>r2 -? jesse 46: 47: 48: d:\>r2 -S liberty jesse 49: 50: d:\>r2 -? liberty 51: Found: 0/16/95 eternal vigilance is the price of liberty 52: 53: 54: d:\>r2 -? jesse 55: Found: 0/16/95 eternal vigilance is the price of libertyAnalysis:
Let's start by examining the output. The data files and indexes were deleted before beginning this test of the program.
In lines 1, 4, 7, 10, and 13, data was added to the files. In line 16, a report was requested, which is shown in lines 18 through 39. You will note that this database is order 4 (4 indexes on a page), which is not efficient, but is useful during debugging.
In line 41, the user searches for the term liberty; it is found, as shown in line 42. In line 45, the user searches for the term jesse, but this is not found. In line 48, the user establishes jesse as a synonym for liberty. There is no feedback, although it would be nice if the program came back and verified the addition of the synonym.
In line 50, the user again searches for liberty and receives the same results as the first attempt. In line 54, the user searches for jesse and this time receives a response: The same as if the term liberty had been searched for, which is consistent with the establishment of a synonym.
The easiest way to see how the synonym is established is to examine lines 76 through 79 of listing 16.9. When the user enters -S, a synonym is to be created. The next term and the word following are passed to the new DoSyn() method, as shown in lines 268 through 277 of the same listing.
In line 270, the tree is searched for the synonym. If it exists already as a term in the database, it will not be added as a synonym. Otherwise, the tree is searched for the original term.
Note that a new method has been added to BTree: FindExact. The implementation is shown in lines 46 through 54 of listing 16.2. The significant difference from Find is that the call to the page's Insert() method passes in TRUE to the last two parameters. These are declared in line 233 of listing 16.1, in the declaration of the Page object. The implementation is shown in lines 80 through 97 of listing 16.4.
Note that the Boolean value synonym is passed along to both FindLeaf() and InsertNode(). These values are passed from method to method until they are needed. The first use of the value is in line 114 of listing 16.4. It is imperative to match only on an exact match with the original term, so the value for synonym is examined and the correct method (Begins() or operator==) is called accordingly. This is repeated in lines 186 through 200.
If the original term is not found, the synonym will not be added. If the term is found, BTree::AddKey() is called, passing in the synonym, the offset, and the Boolean value TRUE.
Examination of the interface for the AddKey() method, as shown in line 139 of listing 16.1, shows that AddKey() has a third parameter, synonym, which defaults to FALSE.
Program execution jumps to line 100 of listing 16.2. An index object is created, using the new term as its str value, and the offset returned by the call to FindExact() as its offset value. This is exactly what you want: The index now points to the correct entry in the WNJFile. The trick will be to avoid adding this record to the data file and giving it its own WNJFile entry.
Once again, the synonym flag is used, this time in line 219 of listing 16.4. When this flag is set, the index object is not passed to the WNJFile for appending, and its pointer is not set to point to the new WNJFile record. Because the pointer already points to the correct WNJFile record, the index is left as it is before being added to the page.
Today you learned how to add synonyms to the utility, and in general how to create extensible programs. You examined the cost of developing a program that is maintainable and that can grow over time, and you learned a specific technique for adding synonyms to a database program.
Q: What does it take to make a program maintainable?
A: The program must be designed with a clarity of vision, the modules must be discrete, the interfaces must be well-defined, and the code must be well-documented.
Q: Are smaller programs more maintainable than larger programs?
A: All things being equal, it is, of course, easier to understand a smaller program. A well-written large program, however, can be far easier to maintain than a small utility written without regard to documentation and maintainability.
Q: If I'm the programmer and no one else is working on the program, why bother with comments?
A: When you return to your program in a few months (to fix a bug, to extend its functionality, or simply to learn how you solved a problem that has returned in a new program), you will need the comments to help you remember how the program works in detail.
The Workshop provides quiz questions to help you solidify your understanding of the material covered, and exercises to provide you with experience in using what you have learned. Try to answer the quiz and exercise questions before checking the answers in Appendix A, and make sure that you understand the answers before continuing to the next chapter.
Go to: Table of Contents | Next Page