An iterator is an object that points into a list and can return the first, current, last, or next item on that list. Generic iterators are difficult to write and raise many interesting programming and database issues. Today you will learn
An iterator enables you to move through a list from a given starting point, examining each node in turn. General purpose iterators can be very flexible, enabling you to walk forward and backward in the list.
Most collections let the program ask for more than one iterator on the list, which means that the user can have iterators pointing to different parts of the same list. Additionally, some iterators are read/write, which enables the user to insert and delete items in the list.
The best way to study iterators is to use one to solve a real-world problem. ROBIN, as currently written, has a significant limitation: You can search only for an exact match on a given term. If there are notes with the words computer, computing, and computerize, there is no easy way to search for all these at the same time.
Presumably, the user would like to enter the string comput and see all the notes with words that begin with these six letters. Searching the list for each of these permutations is tricky, however, because the words computing and computerize might not be on the same page.
Getting from computing to computerize requires going up through two nodes and back down to a new leaf page. Putting the capability to do this into the list makes no sense; the list's primary mode of finding and adding indexes is the existing recursive routines.
Creating an iterator for the BTree class solves this problem. The program searches for the first word that matches the target string and returns that index. The search also initializes the iterator, so that calling GetNext() will return the next index that matches the string.
Because the list is sorted alphabetically, finding all the words that match the string is as simple as finding the first match, and then iterating through the list until you find the first word that does not match the target string, at which time you are done.
To accomplish this goal, the B-tree must be extended to include an iterator object. The job of the iterator is to answer the question What is the next entry in the list?
To simplify this task, I've tailored the iterator to meet the specific needs of the existing B-tree. The iterator keeps track of every page that matches the target string until the leaf page is found. The next index is defined as the next index in the tree after the last find().
Usually, the next index is on the same page as the current index, but not always. When the iterator must find the next page, it walks up its node list, hunting for the first node that has another index in its list. After the iterator finds that node, it works its way back down the tree to the first index on the leaf page and returns that value.
It is a design goal of the Iterator class to hide all the details of finding the next index from its clients. The client calls GetNext(), which returns the offset of the next matching record, or 0 if there is none.
To keep the code simple, and to allow a full and in-depth exploration of how the iterator works, I'll implement only the most fundamental functionality in the iterator, as shown in the following listings. Listing 15.1 provides the header file BTree.hpp. Listing 15.2 is BTree.cpp, listing 15.3 is Page.cpp, listing 15.4 is Iterator.cpp, and listing 15.5 is Index.cpp. The data file and wnjfile listings have not changed, but are provided here for completeness as listings 15.6 and 15.7, respectively. Listing 15.9 is the driver program.
Listing 15.1 BTree.hpp
1: //********************************************* 2: // PROGRAM: Btree, Page and Index declarations 3: // FILE: btree.hpp 4: // PURPOSE: provide fundamental btree functionality 5: // NOTES: 6: // AUTHOR: Jesse Liberty (jl) 7: // REVISIONS: 11/1/94 1.0 jl initial release 8: // ************************************************** 9: #ifndef BTREE_HPP // inclusion guards 10: #define BTREE_HPP 11: #include <time.h> 12: #include <string.h> 13: #include <fstream.h> 14: #include "stdef.hpp" 15: const int Order = 4; //31; // 31 indices and 1 header 16: const int dataLen = 11; // length of a key 17: const int MaxPages = 500; // more than we need 18: const int SizeItem = 16; // key + offset 19: const int SizePointer = 2; // size of offset 20: const int PageSize = (Order+1) * SizeItem; 21: const int offInt = 2; 22: // forward declarations 23: class Page; 24: class Index; 25: // all members are public 26: // used only by Iterator 27: struct History 28: { 29: History(int, int,BOOL); 30: int PageNo; 31: int OffSet; 32: BOOL IsNode; 33: }; 34: // Iterator class keeps track 35: // of current (and next) index 36: class Iterator 37: { 38: public: 39: Iterator(); 40: ~Iterator(); 41: void RecordNode(int, int); 42: void RecordLeaf(int, int); 43: History * GetLast(); 44: History * GetFirst(); 45: int GetNext(const Index&); 46: void Reset(); 47: private: 48: History* myHistory[MaxPages]; 49: int myNext; 50: }; 51: class DataFile 52: { 53: public: 54: // constructors & destructor 55: DataFile(); 56: ~DataFile(){} 57: // management of files 58: void Close(); 59: void Create(); 60: void GetRecord(int, char*, int&, time_t&); 61: int Insert(char *); 62: private: 63: fstream myFile; 64: }; 65: class WNJFile 66: { 67: public: 68: // constructors & destructor 69: WNJFile(); 70: ~WNJFile(){} 71: // management of files 72: void Close(); 73: void Create(); 74: int* Find(int NextWNJ); 75: int Insert(int, int); 76: int Append(int); 77: private: 78: static int myCount; 79: fstream myFile; 80: union 81: { 82: int myints[5]; 83: char myArray[5*4]; 84: }; 85: }; 86: 87: // IDXManager - in memory keeps track of what pages are 88: // already in memory and swaps to disk as required 89: class IDXManager 90: { 91: public: 92: // constructors & destructor 93: IDXManager(); 94: ~IDXManager(){} 95: 96: // management of files 97: void Close(int); 98: int Create(); 99: 100: // Page manipulation 101: Page* GetPage(int); 102: void SetPage(Page*); 103: void Insert(Page * newPage); 104: void Save(Page* pg); 105: int NewPage(Index&, BOOL); 106: int NewPage(Index *array, int offset, BOOL leaf, int count); 107: 108: private: 109: Page* myPages[MaxPages]; 110: fstream myFile; 111: int myCount; 112: }; 113: 114: // the btree itself -has a pointer to first page 115: class BTree 116: { 117: public: 118: // constructors and destructor 119: BTree(); 120: ~BTree(); 121: 122: // utility functions 123: void AddKey(char* data, int offset);//,BOOL synonym = FALSE); 124: void Insert(char* buffer, int, char**); 125: void Insert(char*); 126: void PrintTree(); 127: int Find(char* data); 128: int FindExact(char* data); 129: int GetNext(char* data); 130: 131: // page methods 132: Page* GetPage(int page) 133: { return theIDXManager.GetPage(page); } 134: int NewPage(Index& pIndex, BOOL IsLeaf) 135: { return theIDXManager.NewPage(pIndex,FALSE); } 136: 137: static int myAlNum(char ch); 138: // public static member! 139: static IDXManager theIDXManager; 140: static WNJFile theWNJFile; 141: static DataFile theDataFile; 142: static Iterator theIter; 143: 144: static BOOL GetWord(char*, char*, int&); 145: static void GetStats(); 146: static int NodeIndexCtr; 147: static int LeafIndexCtr; 148: static int NodePageCtr; 149: static int LeafPageCtr; 150: static int NodeIndexPerPage[Order+1]; 151: static int LeafIndexPerPage[Order+1]; 152: 153: private: 154: int myRoot; 155: }; 156: 157: // index objects point to pages or real data 158: class Index 159: { 160: public: 161: // constructors & destructor 162: Index(); 163: Index(char *); 164: Index (char*, int); 165: Index(Index&); 166: ~Index(){} 167: 168: // accessors 169: const char * GetData() const { return myData; } 170: void SetData(const Index& rhs) 171: { strcpy(myData,rhs.GetData()); } 172: void SetData(const char * rhs) 173: { strcpy(myData,rhs); } 174: int GetPointer()const { return myPointer; } 175: void SetPointer (int pg) { myPointer = pg; } 176: 177: // utility functions 178: void PrintKey(); 179: void PrintPage(); 180: int Insert(Index& ref,BOOL findOnly = FALSE);//, BOOL findExac = FALSE); 181: 182: int Begins(const Index& rhs) const; 183: 184: // overloaded operators 185: int operator==(const Index& rhs); 186: 187: int operator < (const Index& rhs) 188: {return strcmp(myData,rhs.GetData())<0;} 189: 190: int operator <= (const Index& rhs) 191: {return strcmp(myData,rhs.GetData())<=0;} 192: 193: int operator > (const Index& rhs) 194: {return strcmp(myData,rhs.GetData())>0;} 195: 196: int operator >= (const Index& rhs) 197: {return strcmp(myData,rhs.GetData())>=0;} 198: 199: public: 200: int myPointer; 201: char myData[SizeItem - SizePointer]; 202: }; 203: 204: 205: // pages - consist of header and array of indices 206: class Page 207: { 208: public: 209: // constructors and destructor 210: Page(); 211: Page(char*); 212: Page(Index&,BOOL); 213: Page(Index*, int, BOOL, int); 214: ~Page(){} 215: 216: // insertion and searchoperations 217: int Insert(Index&, BOOL findOnly = FALSE);//, BOOL synonym = FALSE); 218: int Find(Index& idx) { return Insert(idx,TRUE); } 219: int InsertLeaf(Index&); 220: int FindLeaf(Index&,BOOL findOnly);//, BOOL synonym = FALSE); 221: int InsertNode(Index&,BOOL findOnly = FALSE);//, BOOL synonym = FALSE); 222: void Push(Index&,int offset=0, BOOL=TRUE); 223: 224: // accessors 225: Index GetFirstIndex() { return myKeys[0]; } 226: Index GetIndex(int offset) { return myKeys[offset]; } 227: BOOL GetIsLeaf() const { return myVars.IsLeaf; } 228: int GetCount() const { return myVars.myCount; } 229: void SetCount(int cnt) { myVars.myCount=cnt; } 230: time_t GetTime() { return myTime; } 231: BOOL GetLocked() { return myVars.IsLocked; } 232: void SetLocked (BOOL state) { myVars.IsLocked = state; } 233: 234: // page manipulation 235: int GetPageNumber() const { return myVars.myPageNumber; } 236: Page* GetPage(int page) 237: { return BTree::theIDXManager.GetPage(page); } 238: int NewPage(Index& rIndex, BOOL IsLeaf) 239: { return BTree::theIDXManager.NewPage(rIndex,FALSE); } 240: 241: int NewPage(Index* arr, int off,BOOL isL, int cnt) 242: { return BTree::theIDXManager.NewPage(arr,off,isL, cnt); } 243: 244: // utility functions 245: void Nullify(int offset); 246: void Print(); 247: fstream& Write(fstream&); 248: void ReCount(); 249: 250: static int GetgPageNumber(){ return gPage; } 251: static void SetgPageNumber(int pg) { gPage = pg; } 252: 253: private: 254: Index * const myKeys; // will point to myVars.mk 255: union 256: { 257: char myBlock[PageSize]; // a page from disk 258: struct 259: { 260: int myCount; 261: BOOL IsLeaf; 262: int myPageNumber; 263: BOOL IsLocked; 264: char filler[8]; // we want 16 bytes of overhead 265: char mk[Order*sizeof(Index)]; // array of indices 266: }myVars; 267: }; 268: 269: // memory only 270: static int gPage; 271: time_t myTime; // for lifo queue 272: }; 273: 274: #endif
Listing 15.2 BTree.cpp
1: // ************************************************** 2: // PROGRAM: Btree 3: // FILE: btree.cpp 4: // PURPOSE: provide fundamental btree functionality 5: // NOTES: 6: // AUTHOR: Jesse Liberty (jl) 7: // REVISIONS: 11/1/94 1.0 jl initial release 8: // ************************************************** 9: 10: #include "btree.hpp" 11: #include "stdef.hpp" 12: #include <ctype.h> 13: #include <stdlib.h> 14: 15: // construct the tree 16: // initialize myRoot pointer to nothing 17: // either create or read the index file 18: BTree::BTree():myRoot(0) 19: { 20: myRoot = theIDXManager.Create(); 21: theDataFile.Create(); 22: theWNJFile.Create(); 23: 24: } 25: 26: // write out the index file 27: BTree::~BTree() 28: { 29: theIDXManager.Close(myRoot); 30: theDataFile.Close(); 31: theWNJFile.Close(); 32: } 33: 34: // find an existing record 35: int BTree::Find(char * str) 36: { 37: Index index(str); 38: BTree::theIter.Reset(); 39: if (!myRoot) 40: return 0L; 41: else 42: return GetPage(myRoot)->Find(index); 43: } 44: 45: 46: int BTree::GetNext(char * str) 47: { 48: Index index(str); 49: return BTree::theIter.GetNext(index); 50: } 51: 52: 53: void BTree::Insert(char * buffer, int argc, char** argv) 54: { 55: // get each word, 56: int offset = theDataFile.Insert(buffer); 57: for (int i = 1; i<argc; i++) 58: { 59: AddKey(argv[i],offset); 60: } 61: 62: } 63: 64: void BTree::Insert(char* buffer) 65: { 66: 67: if (strlen(buffer) < 3) 68: return; 69: 70: char *buff = buffer; 71: char word[PageSize]; 72: int wordOffset = 0; 73: int offset; 74: 75: if (GetWord(buff,word,wordOffset)) 76: { 77: offset = theDataFile.Insert(buffer); 78: AddKey(word,offset); 79: } 80: 81: 82: while (GetWord(buff,word,wordOffset)) 83: { 84: AddKey(word,offset); 85: } 86: 87: } 88: 89: void BTree::AddKey(char * str, int offset ) 90: { 91: 92: if (strlen(str) < 3) 93: return; 94: 95: int retVal =0; 96: Index index(str,offset); 97: if (!myRoot) 98: { 99: myRoot = theIDXManager.NewPage (index,TRUE); 100: } 101: else 102: { 103: retVal = GetPage(myRoot)->Insert(index); 104: if (retVal) // our root split 105: { 106: // create a pointer to the old top 107: Index index(GetPage(myRoot)->GetFirstIndex()); 108: index.SetPointer(myRoot); 109: // make the new page & insert the index 110: int PageNumber = NewPage(index,FALSE); 111: 112: Page* pg = GetPage(PageNumber); 113: 114: //get a pointer to the new (sibling) page 115: Index Sib(GetPage(retVal)->GetFirstIndex()); 116: Sib.SetPointer(retVal); 117: // put it into the page 118: pg->InsertLeaf(Sib); 119: 120: // reset myRoot to point to the new top 121: myRoot = PageNumber; 122: } 123: } 124: } 125: 126: void BTree::PrintTree() 127: { 128: GetPage(myRoot)->Print(); 129: 130: cout << "\n\nStats:" << endl; 131: cout << "Node pages: " << NodePageCtr << endl; 132: cout << "Leaf pages: " << LeafPageCtr << endl; 133: cout << "Node indexes: " << NodeIndexCtr << endl; 134: cout << "Leaf indexes: " << LeafIndexCtr << endl; 135: for (int i = 0; i < Order +1; I++) 136: { 137: if (NodeIndexPerPage[i]) 138: { 139: cout << "Pages with " << i << " nodes: "; 140: cout << NodeIndexPerPage[i] << endl; 141: } 142: if (LeafIndexPerPage[i]) 143: { 144: cout << "Pages with " << i << " leaves: "; 145: cout << LeafIndexPerPage[i] << endl; 146: } 147: } 148: 149: } 150: 151: BOOL BTree::GetWord(char* string, char* word, int& wordOffset) 152: { 153: 154: if (!string[wordOffset]) 155: return FALSE; 156: 157: char *p1, *p2; 158: p1 = p2 = string+wordOffset; 159: 160: // eat leading spaces 161: for (int i = 0; i<(int)strlen(p1) && !BTree::myAlNum(p1[0]); i++) 162: p1++; 163: 164: // see if you have a word 165: if (!BTree::myAlNum(p1[0])) 166: return FALSE; 167: 168: p2 = p1; // point to start of word 169: 170: // march p2 to end of word 171: while (BTree::myAlNum(p2[0])) 172: p2++; 173: 174: int len = int(p2-p1); 175: #if defined(_MSVC_16BIT) || defined(_MSVC_32BIT) : { : len = __min(len,(int)PageSize); : } : #else : { : len = min(len,(int)PageSize); : } : #endif 176: 177: strncpy (word,p1,len); 178: word[len]='\0'; 179: 180: for (i = int (p2-string); i<(int)strlen(string) && !BTree::myAlNum(p2[0]); i++) 181: p2++; 182: 183: wordOffset = int (p2-string); 184: 185: return TRUE; 186: } 187: 188: int BTree::myAlNum(char ch) 189: { 190: return isalnum(ch) || 191: ch == '-' || 192: ch == '\'' || 193: ch == '(' || 194: ch == ')'; 195: }
Listing 15.3 Page.cpp
1: // ************************************************** 2: // PROGRAM: Page 3: // FILE: Page.cpp 4: // PURPOSE: provide fundamental btree functionality 5: // NOTES: 6: // AUTHOR: Jesse Liberty (jl) 7: // REVISIONS: 11/1/94 1.0 jl initial release 8: // ************************************************** 9: 10: #include "btree.hpp" 11: #include <assert.h> 12: 13: // constructors 14: 15: // default constructor 16: Page::Page() 17: { 18: } 19: 20: // create a page from a buffer read in from disk 21: Page::Page(char *buffer): 22: myKeys((Index*)myVars.mk) 23: { 24: assert(sizeof(myBlock) == PageSize); 25: assert(sizeof(myVars) == PageSize); 26: memcpy(&myBlock,buffer,PageSize); 27: SetLocked(FALSE); 28: myTime = time(NULL); 29: } 30: 31: // create a Page from the first index 32: Page::Page(Index& index, BOOL bLeaf): 33: myKeys((Index*)myVars.mk) 34: { 35: 36: myVars.myCount=1; 37: myVars.IsLeaf = bLeaf; 38: SetLocked(FALSE); 39: // if this is a leaf, this is the first 40: // index on the first page, set its pointer 41: // based on creating a new wnj. otherwise 42: // you are here creating a new node, do not 43: // set the pointer, it is already set. 44: if (bLeaf) 45: index.SetPointer(BTree::theWNJFile.Append(index.GetPointer())); 46: myKeys[0]=index; 47: myVars.myPageNumber = gPage++; 48: myTime = time(NULL); 49: } 50: 51: // create a page by splitting another page 52: Page::Page(Index *array, int offset, BOOL bLeaf,int count): 53: myKeys((Index*)myVars.mk) 54: { 55: myVars.IsLeaf = bLeaf; 56: myVars.myCount = 0; 57: for (int i=0, j = offset; j<Order && i < count; i++, j++) 58: { 59: myKeys[i]= array[j]; 60: myVars.myCount++; 61: } 62: myVars.myPageNumber = gPage++; 63: SetLocked(FALSE); 64: myTime = time(NULL); 65: } 66: 67: void Page::Nullify(int offset) 68: { 69: for (int i = offset; i<Order; i++) 70: { 71: myKeys[i].SetPointer(0); 72: myVars.myCount; 73: } 74: } 75: 76: // decide whether I'm a leaf or a node 77: // and pass this index to the right 78: // function. If findOnly is true, don't insert 79: // just return the page number (for now) 80: int Page::Insert(Index& rIndex, BOOL findOnly) 81: { 82: int result; 83: if (myVars.IsLeaf) 84: { 85: SetLocked(TRUE); 86: result = FindLeaf(rIndex,findOnly); 87: SetLocked(FALSE); 88: return result; 89: } 90: else 91: { 92: SetLocked(TRUE); 93: result = InsertNode(rIndex,findOnly); 94: SetLocked(FALSE); 95: return result; 96: } 97: } 98: 99: 100: // find the right page for this index 101: int Page::InsertNode(Index& rIndex, BOOL findOnly) 102: { 103: int retVal =0; 104: BOOL inserted = FALSE; 105: int i; 106: 107: assert(myVars.myCount>0); // nodes have at least 1 108: assert(myKeys[0].GetPointer()); // must be valid 109: 110: if (findOnly) 111: { 112: for (i = 0; i< myVars.myCount; i++) 113: { 114: if (rIndex.Begins(myKeys[i])) 115: { 116: BTree::theIter.RecordNode(myVars.myPageNumber,i); 117: return myKeys[i].Insert(rIndex,findOnly); 118: } 119: } 120: } 121: 122: 123: // does it go before my first entry? 124: if (!inserted && rIndex < myKeys[0]) 125: { 126: 127: if (findOnly) 128: return 0L; 129: 130: myKeys[0].SetData(rIndex); 131: BTree::theIter.RecordNode(myVars.myPageNumber,0); 132: retVal=myKeys[0].Insert(rIndex,findOnly); 133: inserted = TRUE; 134: } 135: 136: // find insertion point 137: if (!inserted) 138: { 139: for (i = myVars.myCount-1; i>=0; i) 140: { 141: assert(myKeys[i].GetPointer()); 142: if ( (rIndex >= myKeys[i])) 143: { 144: BTree::theIter.RecordNode(myVars.myPageNumber,i); 145: retVal=myKeys[i].Insert(rIndex,findOnly); 146: inserted = TRUE; 147: break; 148: } 149: } 150: } 151: assert(inserted); // change to exception if not! 152: 153: // if you had to split 154: if (retVal && !findOnly) // got back a pointer to a new page 155: { 156: Index * pIndex = new Index(GetPage(retVal)->GetFirstIndex()); 157: pIndex->SetPointer(retVal); 158: retVal = InsertLeaf(*pIndex); 159: } 160: return retVal; 161: } 162: 163: // called if current page is a leaf 164: int Page::FindLeaf(Index& rIndex, BOOL findOnly) 165: { 166: int result = 0; 167: 168: // no duplicates! 169: for (int i=0; i < myVars.myCount; i++) 170: { 171: if (findOnly) 172: { 173: if (rIndex.Begins(myKeys[i])) 174: { 175: BTree::theIter.RecordLeaf(myVars.myPageNumber,i); 176: return myKeys[i].GetPointer(); 177: } 178: } 179: else 180: if (rIndex == myKeys[i]) 181: return BTree::theWNJFile.Insert( 182: rIndex.GetPointer(), 183: myKeys[i].GetPointer()); 184: } 185: 186: if (findOnly) // not found 187: return result; 188: 189: // this index item does not yet exist 190: // before you push it into the index 191: // push an entry into the wnj.idx 192: // and set the index to point to that entry 193: rIndex.SetPointer(BTree::theWNJFile.Append(rIndex.GetPointer())); // new! 194: return InsertLeaf(rIndex); 195: } 196: 197: int Page::InsertLeaf(Index& rIndex) 198: { 199: int result = 0; 200: if (myVars.myCount < Order) 201: Push(rIndex); 202: else // overflow the page 203: { 204: // make sibling 205: int NewPg = 206: NewPage(myKeys,Order/2,myVars.IsLeaf,myVars.myCount); 207: Page* Sibling = GetPage(NewPg); 208: Nullify(Order/2); // nullify my right half 209: 210: // does it fit in this side? 211: if (myVars.myCount>Order/2-1 && rIndex <= myKeys[Order/2-1]) 212: Push(rIndex); 213: else // push it into the new sibling 214: Sibling->Push(rIndex); 215: 216: result = NewPg; // we split, pass it up 217: } 218: return result; 219: } 220: 221: // put the new index into this page (in order) 222: void Page::Push(Index& rIndex,int offset,BOOL first) 223: { 224: BOOL inserted = FALSE; 225: assert(myVars.myCount < Order); 226: for (int i=offset; i<Order && i<myVars.myCount; i++) 227: { 228: assert(myKeys[i].GetPointer()); 229: if (rIndex <= myKeys[i]) 230: { 231: Push(myKeys[i],offset+1,FALSE); 232: myKeys[i]=rIndex; 233: inserted = TRUE; 234: break; 235: } 236: } 237: if (!inserted) 238: myKeys[myVars.myCount] = rIndex; 239: 240: if (first) 241: myVars.myCount++; 242: } 243: 244: 245: void Page::Print() 246: { 247: if (!myVars.IsLeaf) 248: { 249: BTree::NodePageCtr++; 250: BTree::NodeIndexPerPage[myVars.myCount]++; 251: BTree::NodeIndexCtr+=myVars.myCount; 252: } 253: else 254: { 255: BTree::LeafPageCtr++; 256: BTree::LeafIndexPerPage[myVars.myCount]++; 257: BTree::LeafIndexCtr+=myVars.myCount; 258: } 259: 260: for (int i = 0; i<Order && i < myVars.myCount; i++) 261: { 262: assert(myKeys[i].GetPointer()); 263: if (myVars.IsLeaf) 264: myKeys[i].PrintKey(); 265: else 266: myKeys[i].PrintPage(); 267: } 268: } 269: 270: // write out the entire page as a block 271: fstream& Page::Write(fstream& file) 272: { 273: char buffer[PageSize]; 274: memcpy(buffer,&myBlock,PageSize); 275: file.flush(); 276: file.clear(); 277: file.seekp(myVars.myPageNumber*PageSize); 278: file.write(buffer,PageSize); 279: return file; 280: }
Listing 15.4 Iterator.cpp
1: // ************************************************** 2: // PROGRAM: Iterator 3: // FILE: iter.cpp 4: // PURPOSE: iterator for BTree 5: // NOTES: 6: // AUTHOR: Jesse Liberty (jl) 7: // REVISIONS: 12/1/94 1.0 jl initial release 8: // ************************************************** 9: #include "btree.hpp" 10: 11: // history class implementations 12: History::History(int pno, int off, BOOL node): 13: PageNo(pno), 14: OffSet(off), 15: IsNode(node) 16: {} 17: 18: // Iterator implementations 19: 20: // start off with blank iterator 21: Iterator::Iterator() 22: { 23: Reset(); 24: } 25: 26: Iterator::~Iterator() 27: { 28: for (int i = 0; i<MaxPages; i++) 29: delete myHistory[i]; 30: } 31: 32: void Iterator::RecordNode(int PageNo, int OffSet) 33: { 34: myHistory[myNext++] = new History(PageNo,OffSet,TRUE); 35: } 36: 37: void Iterator::RecordLeaf(int PageNo, int OffSet) 38: { 39: myHistory[myNext++] = new History(PageNo,OffSet,FALSE); 40: } 41: 42: History* Iterator::GetLast() 43: { 44: return myHistory[myNext-1]; 45: } 46: 47: History* Iterator::GetFirst() 48: { 49: return myHistory[0]; 50: } 51: 52: void Iterator::Reset() 53: { 54: for (int i = 0; i<MaxPages; i++) 55: myHistory[i] = 0; 56: myNext = 0; 57: } 58: 59: 60: int Iterator::GetNext(const Index& theIndex) 61: { 62: 63: for (;;) 64: { 65: History * pHist = GetLast(); 66: int pgNo = pHist-> PageNo; 67: int newOffSet = pHist->OffSet+1; 68: Page * pg = BTree::theIDXManager.GetPage(pgNo); 69: 70: if (newOffSet < pg->GetCount()) 71: { 72: Index Idx = pg->GetIndex(newOffSet); 73: pHist->OffSet = newOffSet; 74: for (;;) 75: { 76: if (pg->GetIsLeaf()) 77: { 78: // cout << "Key: " << Idx.GetData() << endl; 79: if (theIndex.Begins(Idx)) 80: return Idx.GetPointer(); 81: else 82: return 0; 83: } 84: else 85: { 86: pg = BTree::theIDXManager.GetPage(Idx.GetPointer()); 87: Idx = pg->GetFirstIndex(); 88: pHist = new History(pg->GetPageNumber(),0, (BOOL) !pg->GetIsLeaf()); 89: myHistory[myNext++] = pHist; 90: } 91: } // end inner for loop 92: } 93: else 94: { 95: delete myHistory[myNext-1]; 96: myHistory[myNext-1] = 0; 97: myNext; 98: if (!myNext) 99: return 0; 100: } 101: } // end outer for loop 102: }
Listing 15.5 Index.cpp
1: // ************************************************** 2: // PROGRAM: index 3: // FILE: index.cpp 4: // PURPOSE: provide fundamental btree functionality 5: // NOTES: 6: // AUTHOR: Jesse Liberty (jl) 7: // REVISIONS: 11/1/94 1.0 jl initial release 8: // ************************************************** 9: 10: #include "btree.hpp" 11: #include <ctype.h> 12: 13: Index::Index(char* str):myPointer(1) 14: { 15: strncpy(myData,str,dataLen); 16: myData[dataLen]='\0'; 17: for (int i = 0; i< strlen(myData); i++) 18: myData[i] = toupper(myData[i]); 19: } 20: 21: Index::Index(char* str, int ptr):myPointer(ptr) 22: { 23: 24: strncpy(myData,str,dataLen); 25: myData[dataLen]='\0'; 26: for (int i = 0; i< strlen(myData); i++) 27: myData[i] = toupper(myData[i]); 28: 29: } 30: 31: Index::Index(Index& rhs): 32: myPointer(rhs.GetPointer()) 33: { 34: strcpy(myData, rhs.GetData()); 35: for (int i = 0; i< strlen(myData); i++) 36: myData[i] = toupper(myData[i]); 37: 38: } 39: 40: Index::Index():myPointer(0) 41: { 42: myData[0]='\0'; 43: } 44: 45: void Index::PrintKey() 46: { 47: cout << " " << myData; 48: } 49: 50: void Index::PrintPage() 51: { 52: cout << "\n" << myData << ": " ; 53: BTree::theIDXManager.GetPage(myPointer)->Print(); 54: } 55: 56: int Index::Insert(Index& ref, BOOL findOnly) 57: { 58: return BTree::theIDXManager.GetPage(myPointer)->Insert(ref,findOnly); 59: } 60: 61: int Index::operator==(const Index& rhs) 62: { 63: return (strcmp(myData,rhs.GetData()) == 0); // case insensitive 64: } 65: 66: int Index::Begins(const Index& rhs) const 67: { 68: int len = strlen(myData); 69: if (!len) 70: return TRUE; 71: return (strncmp(myData,rhs.GetData(),len) == 0); 72: }
Listing 15.6 Datafile.cpp
1: // ************************************************** 2: // PROGRAM: Data file 3: // FILE: datafile.cpp 4: // PURPOSE: 5: // NOTES: 6: // AUTHOR: Jesse Liberty (jl) 7: // REVISIONS: 11/5/94 1.0 jl initial release 8: // ************************************************** 9: 10: #include "btree.hpp" 11: #include <assert.h> 12: 13: // on construction, try to open the file if it exists 14: DataFile::DataFile(): 15: myFile("ROBIN.DAT", 16: ios::binary | ios::in | ios::out | ios::nocreate | ios::app) 17: { 18: } 19: 20: void DataFile::Create() 21: { 22: if (!myFile) // nocreate failed, first creation 23: { 24: 25: // open the file, create it this time 26: myFile.clear(); 27: 28: myFile.open 29: ("ROBIN.DAT",ios::binary | ios::in | ios::out | ios::app); 30: 31: char Header[2]; 32: int MagicNumber = 1234; // a number we can check for 33: memcpy(Header,&MagicNumber,2); 34: myFile.clear(); 35: myFile.flush(); 36: myFile.seekp(0); 37: myFile.write(Header,2); 38: return; 39: } 40: 41: // we did open the file, it already existed 42: // get the numbers we need 43: int MagicNumber; 44: myFile.seekg(0); 45: myFile.read((char *) &MagicNumber,2); 46: 47: // check the magic number. If it is wrong the file is 48: // corrupt or this isn't the index file 49: if (MagicNumber != 1234) 50: { 51: // change to an exception!! 52: cout << "DataFile Magic number failed!"; 53: } 54: return; 55: } 56: 57: // write out the numbers we'll need next time 58: void DataFile::Close() 59: { 60: myFile.close(); 61: } 62: 63: 64: int DataFile::Insert(char * newNote) 65: { 66: int len = strlen(newNote); 67: int fullLen = len + 6; 68: 69: time_t theTime; 70: theTime = time(NULL); 71: 72: char buffer[PageSize]; 73: memcpy(buffer,&len,2); 74: memcpy(buffer+2,&theTime,4); 75: memcpy(buffer+6,newNote,len); 76: 77: myFile.clear(); 78: myFile.flush(); 79: myFile.seekp(0,ios::end); 80: int offset = (int) myFile.tellp(); 81: myFile.write(buffer,fullLen); 82: myFile.flush(); 83: return offset; 84: } 85: 86: void DataFile::GetRecord(int offset, char* buffer, int& len, time_t&theTime) 87: { 88: char tmpBuff[PageSize]; 89: myFile.flush(); 90: myFile.clear(); 91: myFile.seekg(offset); 92: myFile.read(tmpBuff,PageSize); 93: memcpy(&len,tmpBuff,2); 94: memcpy(&theTime,tmpBuff+2,4); 95: strncpy(buffer,tmpBuff+6,len); 96: buffer[len] = '\0'; 97: }
Listing 15.7 IdxMgr.cpp
1: // ************************************************** 2: // PROGRAM: Idxmr 3: // FILE: Idxmgr.cpp 4: // PURPOSE: provide i/o for btree 5: // NOTES: 6: // AUTHOR: Jesse Liberty (jl) 7: // REVISIONS: 11/5/94 1.0 jl initial release 8: // ************************************************** 9: 10: #include "btree.hpp" 11: #include <assert.h> 12: 13: // on construction, try to open the file if it exists 14: IDXManager::IDXManager(): 15: myFile("ROBIN.IDX",ios::binary | ios::in | ios::out | ios::nocreate) 16: { 17: // initialize the pointers to null 18: for (int i = 0; i< MaxPages; i++) 19: myPages[i] = 0; 20: myCount = 0; 21: 22: } 23: 24: // called by btree constructor 25: // if we opened the file, read in the numbers we need 26: // otherwise create the file 27: int IDXManager::Create() 28: { 29: if (!myFile) // nocreate failed, first creation 30: { 31: // open the file, create it this time 32: myFile.open("ROBIN.IDX",ios::binary | ios::in | ios::out); 33: 34: char Header[PageSize]; 35: int MagicNumber = 1234; // a number we can check for 36: memcpy(Header,&MagicNumber,2); 37: int NextPage = 1; 38: memcpy(Header+2,&NextPage,2); 39: memcpy(Header+4,&NextPage,2); 40: Page::SetgPageNumber(NextPage); 41: char title[]="ROBIN.IDX. Ver 1.00"; 42: memcpy(Header+6,title,strlen(title)); 43: myFile.flush(); 44: myFile.clear(); 45: myFile.seekp(0); 46: myFile.write(Header,PageSize); 47: return 0; 48: } 49: 50: // we did open the file, it already existed 51: // get the numbers we need 52: int MagicNumber; 53: myFile.seekg(0); 54: myFile.read((char *) &MagicNumber,2); 55: 56: // check the magic number. If it is wrong the file is 57: // corrupt or this isn't the index file 58: if (MagicNumber != 1234) 59: { 60: // change to an exception!! 61: cout << "Index Magic number failed!"; 62: return 0; 63: } 64: 65: int NextPage; 66: myFile.seekg(2); 67: myFile.read((char*) &NextPage,2); 68: Page::SetgPageNumber(NextPage); 69: int FirstPage; 70: myFile.seekg(4); 71: myFile.read((char*) &FirstPage,2); 72: const int room = PageSize - 6; 73: char buffer[room]; 74: myFile.read(buffer,room); 75: buffer[20]='\0'; 76: // cout << buffer << endl; 77: // read in all the pages 78: for (int i = 1; i < NextPage; i++) 79: { 80: myFile.seekg(i * PageSize); 81: char buffer[PageSize]; 82: myFile.read( buffer, PageSize); 83: Page * pg = new Page(buffer); 84: Insert(pg); 85: } 86: 87: return FirstPage; 88: } 89: 90: // write out the numbers we'll need next time 91: void IDXManager::Close(int theRoot) 92: { 93: 94: for (int i = 0; i< MaxPages; i++) 95: if (myPages[i]) 96: Save(myPages[i]); 97: int NextPage = Page::GetgPageNumber(); 98: if (!myFile) 99: cout << "Error opening myFile!" << endl; 100: myFile.flush(); 101: myFile.clear(); 102: myFile.seekp(2); 103: myFile.write ( (char *) &NextPage,2); 104: myFile.seekp(4); 105: myFile.write((char*) &theRoot,2); 106: myFile.close(); 107: } 108: 109: // wrapper function 110: int IDXManager::NewPage(Index& index, BOOL bLeaf) 111: { 112: Page * newPage = new Page(index, bLeaf); 113: Insert(newPage); 114: Save(newPage); 115: return newPage->GetPageNumber(); 116: } 117: 118: int IDXManager::NewPage( 119: Index *array, 120: int offset, 121: BOOL leaf, 122: int count) 123: { 124: Page * newPage = new Page(array, offset, leaf, count); 125: Insert(newPage); 126: Save(newPage); 127: return newPage->GetPageNumber(); 128: } 129: 130: void IDXManager::Insert(Page * newPage) 131: { 132: 133: // add new page into array of page managers 134: if (myCount < MaxPages) 135: { 136: assert(!myPages[myCount]); 137: myPages[myCount++] = newPage; 138: } 139: else // no room, time to page out to disk 140: { 141: int lowest = -1; 142: 143: for (int i = 0; i< MaxPages; i++) 144: { 145: if (myPages[i]->GetLocked() == FALSE ) 146: lowest = i; 147: } 148: if (lowest == -1) 149: assert(lowest != -1); // change to exception if -1 (no page to kill) 150: 151: for (i = 0; i< MaxPages; i++) 152: if (myPages[i]->GetTime() < myPages[lowest]->GetTime() && myPages[i]->GetLocked() == FALSE) 153: 154: 155: lowest = i; 156: 157: assert(myPages[lowest]); 158: Save(myPages[lowest]); 159: delete myPages[lowest]; 160: myPages[lowest] = newPage; 161: 162: } 163: } 164: 165: // tell the page to write itself out 166: void IDXManager::Save(Page* pg) 167: { 168: pg->Write(myFile); 169: } 170: 171: // see if the page is in memory, if so return it 172: // otherwise get it from disk 173: // note: this won't scale, with lots of page managers 174: // you'd need a more efficient search. 10 levels of page 175: // managers, with 31 indexes per page gives you room for 176: // 800 trillion words. Even if each page is only 1/2 full 177: // on average, 10 levels of depth would represent 64 million 178: // keys alone, not to mention the actual records. 179: 180: Page * IDXManager::GetPage(int target) 181: { 182: 183: for (int i = 0; i< MaxPages; i++) 184: { 185: if (myPages[i]->GetPageNumber() == target) 186: return myPages[i]; 187: } 188: myFile.seekg(target * PageSize); 189: char buffer[PageSize]; 190: myFile.read( buffer, PageSize); 191: Page * pg = new Page(buffer); 192: Insert(pg); 193: return pg; 194: }
Listing 15.8 WNJFile.cpp
1: // ************************************************** 2: // PROGRAM: WNJ file 3: // FILE: WNJfile.cpp 4: // PURPOSE: 5: // NOTES: 6: // AUTHOR: Jesse Liberty (jl) 7: // REVISIONS: 11/5/94 1.0 jl initial release 8: // ************************************************** 9: 10: #include "btree.hpp" 11: #include <assert.h> 12: 13: // on construction, try to open the file if it exists 14: WNJFile::WNJFile(): 15: myFile("ROBINWNJ.IDX", 16: ios::binary | ios::in | ios::out | ios::nocreate) 17: { 18: for (int i = 0; i<5; i++) 19: myints[i]=0L; 20: } 21: 22: void WNJFile::Create() 23: { 24: char Header[4]; 25: int MagicNumber=0; // a number we can check for 26: int zero = 0; 27: 28: if (!myFile) // nocreate failed, first creation 29: { 30: // open the file, create it this time 31: myFile.clear(); 32: myFile.open("ROBINWNJ.IDX", 33: ios::binary | ios::in | ios::out); 34: 35: MagicNumber = 1234; 36: memcpy(Header,&MagicNumber,4); 37: memcpy(Header+2,&zero,2); 38: myFile.seekp(0); 39: myFile.write(Header,4); 40: myFile.flush(); 41: return; 42: } 43: 44: // we did open the file, it already existed 45: // get the numbers we need 46: 47: 48: myFile.seekg(0); 49: myFile.read(Header,4); 50: memcpy(&MagicNumber,Header,2); 51: memcpy(&myCount,Header+2,2); 52: 53: // check the magic number. If it is wrong the file is 54: // corrupt or this isn't the index file 55: if (MagicNumber != 1234) 56: { 57: // change to an exception!! 58: cout << "WNJ Magic number failed!"; 59: cout << "Magic number: " << MagicNumber; 60: cout << "\nmyCount: " << myCount << endl; 61: } 62: return; 63: } 64: 65: // write out the numbers we'll need next time 66: void WNJFile::Close() 67: { 68: myFile.seekg(2); 69: myFile.write((char*)&myCount,2); 70: myFile.close(); 71: } 72: 73: int WNJFile::Append(int DataOffset) 74: { 75: 76: int newPos = 4 + myCount++ * (5*2); 77: int offsets[5]; 78: offsets[0] = DataOffset; 79: for (int i = 1; i<5; i++) 80: offsets[i]=0; 81: myFile.seekg(newPos); 82: myFile.write((char*)offsets,5*2); 83: 84: return newPos; 85: } 86: 87: 88: int WNJFile::Insert(int DataOffset,int WNJOffset) 89: { 90: int ints[5]; 91: myFile.seekg(WNJOffset); 92: myFile.read((char*)ints,5*2); 93: 94: int offset=WNJOffset; 95: 96: while (ints[4]) 97: { 98: offset = ints[4]; 99: myFile.clear(); 100: myFile.flush(); 101: myFile.seekg(ints[4]); 102: myFile.read((char*)ints,5*2); 103: } 104: if (ints[3]) // full! 105: { 106: ints[4] = Append(DataOffset); 107: myFile.clear(); 108: myFile.flush(); 109: myFile.seekg(offset); 110: myFile.write((char*)ints,5*2); 111: } 112: else 113: { 114: for (int i = 0; i<4; i++) 115: { 116: if (ints[i] == 0) 117: { 118: ints[i] = DataOffset; 119: myFile.clear(); 120: myFile.flush(); 121: myFile.seekg(offset); 122: myFile.write((char*)ints,5*2); 123: break; 124: } 125: } 126: } 127: return 0; 128: } 129: 130: 131: int* WNJFile::Find(int NextWNJ) 132: { 133: int ints[5]; 134: int * results = new int[256]; 135: 136: int i = 0, j=0; 137: 138: while (j<256) 139: results[j++] = 0; 140: 141: j = 0; 142: 143: myFile.seekg(NextWNJ); 144: myFile.read((char*)ints,5*2); 145: 146: while (j < 256) 147: { 148: if (ints[i]) 149: { 150: if (i == 4) 151: { 152: myFile.seekg(ints[4]); 153: myFile.read((char*)ints,5*2); 154: i = 0; 155: continue; 156: } 157: results[j++] = ints[i++]; 158: } 159: else 160: break; 161: } 162: return results; 163: }
Listing 15.9 Driver.cpp
1: // Listing 15.9 - Driver.cpp 2: 3: #include "String.hpp" 4: #include "stdef.hpp" 5: #include "btree.hpp" 6: 7: IDXManager BTree::theIDXManager; 8: DataFile BTree::theDataFile; 9: WNJFile BTree::theWNJFile; 10: Iterator BTree::theIter; 11: 12: int WNJFile::myCount=0L; 13: int Page::gPage=1; 14: int BTree::NodeIndexCtr=0; 15: int BTree::LeafIndexCtr=0; 16: int BTree::NodePageCtr=0; 17: int BTree::LeafPageCtr=0; 18: int BTree::NodeIndexPerPage[Order+1]; 19: int BTree::LeafIndexPerPage[Order+1]; 20: 21: int main() 22: { 23: BTree myTree; 24: 25: for (int i = 0; i < Order +2; i++) 26: { 27: BTree::NodeIndexPerPage[i] =0; 28: BTree::LeafIndexPerPage[i] = 0; 29: } 30: 31: char buffer[PageSize+1]; 32: 33: for (;;) 34: { 35: cout << "Enter a string (blank to stop): "; 36: cin.getline(buffer,PageSize); 37: // cin.ignore(PageSize,'\n'); 38: if (buffer[0]) 39: myTree.Insert(buffer); 40: else 41: break; 42: } 43: for (;;) 44: { 45: cout << "Find: "; 46: cin.getline(buffer,255); 47: if (buffer[0]) 48: { 49: int offset = myTree.Find(buffer); 50: while (offset) 51: { 52: int* found = BTree::theWNJFile.Find(offset); 53: int i=0; 54: int len; 55: time_t theTime; 56: char buff[512]; 57: while (found[i]) 58: { 59: BTree::theDataFile.GetRecord(found[i],buff,len, theTime); 60: struct tm * ts = localtime(&theTime); 61: cout << "Found: "; 62: cout << ts->tm_mon << "/"; 63: cout << ts->tm_mday << "/"; 64: cout << ts->tm_year << " "; 65: cout << buff << endl; 66: i++; 67: } 68: delete [] found; 69: offset = myTree.GetNext(buffer); 70: } 71: } 72: else 73: break; 74: } 75: 76: myTree.PrintTree(); 77: 78: return 0; 79: }Output:
d:\112\day16>r1 Enter a string (blank to stop): There is a place for us Enter a string (blank to stop): The quick brown fox jumps Enter a string (blank to stop): Is that a theater production? Enter a string (blank to stop): Theirs is not to question Enter a string (blank to stop): There's no place like home Enter a string (blank to stop): Find: home Found: 11/2/94 There's no place like home Find: the Found: 11/2/94 The quick brown fox jumps Found: 11/2/94 Is that a theater production? Found: 11/2/94 Theirs is not to question Found: 11/2/94 There is a place for us Found: 11/2/94 There's no place like home Find: BROWN: BROWN: BROWN FOR FOX: FOX HOME JUMPS LIKE NOT: NOT: NOT PLACE PRODUCTION QUESTION QUICK: QUICK THAT THE: THE THEATER THEIRS: THEIRS THERE THERE'S Stats: Node pages: 3 Leaf pages: 6 Node indexes: 8 Leaf indexes: 17 Pages with 2 nodes: 2 Pages with 2 leaves: 3 Pages with 3 leaves: 1 Pages with 4 nodes: 1 Pages with 4 leaves: 2Analysis:
In line 40 of BTree.hpp (refer to listing 15.1), the Iterator class is declared. It has an array of history structures, which are declared in lines 30 through 36 of the same file. Each history object knows which binary page it is on, and what its offset is. It also knows whether it is a node (or a page).
In line 155, a static iterator theIter is declared. In line 38 of listing 15.2, the iterator is reset. In line 49, you see that the BTree method, GetNext(), creates an index object and passes it in to the iterator's GetNext() method, which iterates over its list of objects.
In line 10 of the driver program (refer to listing 15.9), the iterator is defined (space is allocated). In line 45, the user is prompted to enter a string for which the program will search. The search is in line 52 and, once found, the record is displayed. In line 69, the iteratorwhich was created in the Find() methodis used to find the next matching record.
A full-fledged iterator should support the previous operation, as well as the next operation. It also should let you start at the first or last node in the tree, and should tell you whether you have gone past the start or end of the tree.
If you were building an iterator for general use, you would want to add a method to BTree that returns a copy of its iterator. You also would want to add methods to Iterator itself (or to BTree as appropriate) to set the iterator to the first or last index in the first or last page.
As mentioned earlier, the trickiest problem when working with iterators is when you allow additions and deletions to the list through the iterator, and you also allow more than one iterator to point to the same list.
If you need to implement this functionality, you have to provide your tree with a list of all the iterators currently pointing into the tree. If the tree is changed for any reason (a node is added or deleted, for example), you need to update all the iterators so that they still are valid.
You didn't have this problem with the iterator shown in the preceding section, because the iterator never was provided as a stand-alone object; the tree maintained its own iterator and kept it up to date each time it did a search.
Today you learned what iterators are and how to create an iterator for the BTree class. You used the iterator to search the list, and you examined many of the programming trade-offs that iterators present.
Q: Why bother with iterators?
A: An iterator enables you to walk a list that otherwise might not be presented in the desired order.
Q: Can you have more than one iterator on a list?
A: Yes, that is one of the great advantages of iterators. You can walk the list from different parts of the same set of data.
Q: What is a disadvantage of providing an iterator?
A: Writing a complete Iterator class is not easy. There are significant issues having to do with adding and deleting items in the list.
The Workshop provides quiz questions to help you solidify your understanding of the material covered, and exercises to provide you with experience in using what you have learned. Try to answer the quiz and exercise questions before checking the answers in Appendix A, and make sure that you understand the answers before continuing to the next chapter.
The IntListIterator should provide methods to get and set the current value (the value in the list that the iterator is pointing to). It also should provide the capability to walk forward and backward through the list, and to jump to the beginning or the end. (Indexed access to the list is unnecessary.) All the methods that move the current position should return a Boolean that specifies whether the current position is valid.
Go to: Table of Contents | Next Page