OS/2 Shareware BBS: 22 gnu

home *** CD-ROM | disk | FTP | other *** search

/ OS/2 Shareware BBS: 22 gnu / 22-gnu.zip / db02_bin.zip / manual.txt < prev next >

Wrap

Text File | 1994-02-22 | 98KB | 3,855 lines

DiamondBase Documentation Version 0.2 Darren Platt Andrew Davison Kevin Lentin darrenp@dibbler.cs.monash.edu.au davison@bruce.cs.monash.edu.au kevinl@bruce.cs.monash.edu.au January 18, 1994 2 DiamondBase_______________________________________________________________1_ 1 What Is DiamondBase ? 1.1 Description DiamondBase is an implementation of a programmer's database. It supports the basic relational model via a schema compiler and the C++ language. It has been designed to make usage of the resulting relations simply, via object methods. It was written by three students at Monash University Melbourne, Australia in their spare time, because they are crazy, and have always harboured secret ambitions to write a database. Kevin's previously frustrated attempts met with Darren's misguided notions that a database would make his PhD easier to write, and Andy wanted a database for a bibliographic retrieval system called Bibel. And so the three muskateers set off in search of adventure. 1.2 Platforms DiamondBase has been compiled for Linux, Ultrix, SunOS 4.1.1 using GCC, for Irix using GCC and CFRONT and OS/2 with the Borland Compiler. It has also been successfully compiled and used on an Amiga using SASC, NeXT using GCC and on an RS6000. It currently requires gnu-make for its makefile, but a generic Makefile is provided. 1.3 Distribution Read the Copyright message in the distribution. 1.4 Limitations 1.4.1 Introduction This section is depressingly large at the moment. It outlines the areas where we believe Diamondbase is currently deficient. This is by no means an exhaustive l* *ist, and we are open to suggestions for improvements. Having listed the deficiencies, it should also be noted that there are no fixed plans to eliminate these defici* *encies. Database extension is currently driven by the whims and requirements of the authors. We are quite open to accepting extensions written by other people - so submit that SQL parser as soon as you have written it. 1.4.2 Limited Multi-User Support This is a pretty major restriction at the moment. You should ensure that only one copy of your application is in use at any one time. 2___________________________________________________What_Is_DiamondBase_?__ Multi-user support is now available under Linux, Ultrix and Sunos. In its current incarnation it is setup to run under one user-id. It can very easily be* * run by multiple people. It must run on one machine. We believe the Irix port of this part of the product should be merely a com* *pi- lation issues. The Amiga and OS/2 versions may require more work. The OS/2 version should be available very soon. 1.4.3 No multi-machine support The multi-user support uses shared memory segments and IPC communication. It can therefore not be used over multiple machines. It can be moved from machine to machine, but at any one time, all clients must be on the same machine as the server. We have used the UNIX ndbm package to implement a multi-user database by accepting connections on a socket and multi-threading the executable. There is only one server process accessing the database but many clients can use it concurrently. If you are interested in such code please let us know. 1.4.4 No SQL I'm philosophical on this point. SQL isn't god's gift to query languages (It's * *more like satan's if the truth is to be known). I would however like to add a query language to DiamondBase at some stage, and SQL is certainly a standard choice. The reliance on compiled code to do the comparison makes an interactive SQL setup more difficult since it would have to second-guess the compiler's layout * *for data structures - or dynamically link the comparison code. Embedded SQL would be easier to implement. It would require a compiler to preprocess the SQL (preferably not by extracting it from the C++ - but having queries stored in separate text files) - and then generate C++ code to access t* *he objects in the right order. 1.4.5 No duplicate indexes This is a nuisance as much as anything. All records must have a unique value for each index. If you wish to effectively have duplicates, then you must supply an extra field which makes them unique. So, for example, if you wish to index employees by age, then you would also include employee number in the index. Finding all the employees who are 12 years old then involves finding the first record with age=12 and id>=0. Continue fetching the next record until age is not equal to 12. This approach is a bit messy and labour intensive, so we would like to perm* *it duplicates - if we can work out what it will do to the poor Btrees. The package does, however, support a unique type. This is a long that is assigned a value by DiamondBase and is guaranteed to be unique within the DiamondBase_______________________________________________________________3_ relation. Included in an index, it allows effectively for duplicate indexes wi* *th much better semantics. 1.4.6 Recovery and Auditing The Btree indexes can be built from the data storage file if they become corrup* *ted - but no facility currently exists to do this. A record dump facility to dump t* *he records in the record store to a text file is currently available in the form o* *f the db2txt program. There is no auditing to keep track of all transactions that were performed. This is not a high priority for us currently. If this is a high prio* *rity for you then please let us know how you would like it done. 1.4.7 Static limits Various components of the database have static limits. These are listed in the table below along with a description and the file in which they are defined bel* *ow. __________________________________________________________________ _ Name/Description _Value _Where _ ___________________________________________________________________ _ MAX_QUERY _10 _defs.h _ _ ________________________ _ _ The maximum number of queries that a bTree or dbObj will allow. _ __________________________________________________________________ _ _ MAX_REG _20 _dbase.h _ _ ________________________ _ _ The maximum number of registrations that the dbObj will allow. _ __________________________________________________________________ _ _ MAX_DB_INFO _20 _dbase.h _ _ ________________________ _ _ The maximum number of relations that the dbObj will keep open. _ ___________________________________________________________________ _ MAX_RELATIONS _20 _nserver.h _ _ ________________________ _ _ The maximum number of relations that the nameserver can handle. _ ___________________________________________________________________ _ MAX_FIELDS_IN_INDEX _10 _idxinfo.h _ _ ________________________ _ _ The maximum number of fields that an index is permitted to have. _ __________________________________________________________________ _ _ MAX_NAME_LENGTH _300 _idxinfo.h _ _ ________________________ _ _ The maximum length that an index name can be. _ ___________________________________________________________________ _ MAX_TRANS_AREA _4K _dcdefs.h _ _ ________________________ _ _ The maximum relation size for the multi version _ __________________________________________________________________ _ 1.4.8 Fixed length records Records are currently of a fixed length determined at the creation of the relat* *ion. Support for variable length fields is available though the dbData and dbString types. The first is a generic binary object. The second is a derivative which behaves like a null terminated string. Details of these features are in Section* * 5. 4________________________________________________________Interface_Functions_ 2 Interface Functions 2.1 Introduction This section describes the basic application functions available to the program- mer. They are supplied as methods to the relation class generated by dsc. 2.1.1 General functions fflbool ok(void) - Description Checks if last database operation was successful. - Return codes true - successful operation false - unsuccessful operation - Example fred yourRel; yourRel.add(); if (!yourRel.ok()) - cerr << "An error occurred." << endl; " fflvoid perror(char *description) - Description Prints a textual description of the last error code to cout - with some textual prefix which you supply. - Example fred yourRel; yourRel.id = 1234; // Which doesn't exist yourRel.get(); if (!yourRel.ok()) - yourRel.perror("Getting a record:"); " fflvoid stats(void) DiamondBase_______________________________________________________________5_ - Description Outputs the current database cache hit statistics to cout for this relation. - Example yourReltype yourRel; yourRel.stats(); Output.. Record Server cache: 1669 attempts, 146 hits = 8.74775% 1523 writes and 1513 disposals Index 0 cache: 0 attempts, 0 hits = 0% 0 writes and 0 disposals Index 1 cache: 593 attempts, 454 hits = 76.5599% 139 writes and 129 disposals Index 2 cache: 3 attempts, 0 hits = 0% 3 writes and 0 disposals 2.1.2 Database Manipulation functions ffldbError add(void) - Description Insert the current values for this relation into the database. Any fields which are of type unique will be assigned a unique field as the record is inserted. The corresponding fields of the structure will reflect these assigned values upon return. The data in the object is not modified by the call (except the unique field). - Return codes db_ok - successful addition db_dup - addition of this record would create a duplicate record on one or more indexes. db_nopen - database isn't currently open. - Example employee newEmp; strcpy(newEmp.name,"Joe Bloggs"); newExp.age = 12; 6________________________________________________________Interface_Functions_ switch(newExp.add()) - case db_ok: break; case db_dup: cerr << "This employee already exists" << endl; break; case db_nopen: cerr << "Open the database first dummy" << endl; " ffldbError begin(int indexNumber = 0) - Description A call to begin initiates a query - signalling an intention to retrieve one or more records from the database using a partic- ular index. Within a query, an implicit pointer is maintained to a position within an ordered index list. Whilst direct fetches with get may be executed without a begin/end pair - begin must be used with all relational retrieval operations. If a previous begin is outstanding when this is called, the previous query is terminated. You should therefore not use get inside a begin/end pair of the same instance of a relation as the get will terminate the query. If you use 2 different instances of the relation, then it will work correctly as each has its own query. The following functions require a query and must therefore be executed within a begin/end pair. extract extractNext extractPrev find first last next peekNext peekPrev prev seek seekFirst seekLast write DiamondBase_______________________________________________________________7_ - Return codes db_ok - the query started correctly db_range - the supplied index is out of range - Example // Retrieve all employees' names. employee tempEmp; tempEmp.begin(dbIdx_name); // Initiate query for(tempEmp.first();tempEmp.ok();tempEmp.next()) - cout << tempEmp.name << endl; " // The same thing, using a while instead tempEmp.seekFirst(); while(tempEmp.next() != db_eof) - cout << tempEmp.name << endl; " tempEmp.end(); ffldbError del(void) - Description Delete the record matching the current values for the index from the database. The data in the object is not modified by the call. - Return codes db_ok - successful deletion db_nfound - Couldn't find record to delete it - Example employee exEmp; exEmp.id = 1234; // Employee id to delete. exEmp.get(); // Fetch this employee. if (exEmp.ok()) - if (exEmp.del()==db_nfound) - 8________________________________________________________Interface_Functions_ cout << "Couldn't find employee" << endl; " " ffldbError end(void) - Description End the previously initiated query. This is mainly for effi- ciency reasons, and to terminate any outstanding locks, since the next begin will automatically terminate the previous begin anyway. - Return codes db_ok - no problems. db_noquery - there was no matching begin. db_range - query value was not in the valid range - Example - see begin ffldbError extract(void) - Description This call does a find on the supplied object and then locks the object if it is found. The record itself is locked, not the key being used, so accesses to the record through any key will fail. After an extract, any access of the record will return db_locked . If this access is an extract, an extract will not occur and the operation will be equivalent to a find. If the access is another read-type operation - then the data will be retrieved. If this access is a write, then it will fail. If the requested record does not exist, then the data in the object is not modified. In that situation, extract degenerates to a seek. - Example - Return codes db_nfound db_ok db_locked ffldbError find(void) - Description DiamondBase_______________________________________________________________9_ Search for the index entry in the database. If the entry exists, then the query pointer is positioned at that point and the record is retrieved. Otherwise, the pointer is moved to the following record, and that record is retrieved. Note that since the retrieval of that record involves a call to next, subsequent calls to next will not return this same record. If you wish to position the relation before a record without advancing the record pointer, use the seek family of functions instead. - Return codes db_ok - a matching record was found retrieved db_nfound - no matching record was found and the data for the next record was placed in the class. db_locked - the record was found and was locked. The data is still returned however. db_range - query value was not in the valid range db_noquery - query value was not valid - Example employee exEmp; exEmp.id = 1234; // Employee id to find. exEmp.find(); // Fetch this employee. ffldbError first(void) - Description The first function may only be executed from within a query (see begin). It sets the query pointer to the first record and returns the record at that position if one exists. Note, that since first retrieves the first record, a subsequent call to next will not return that record. Instead, it will return the second record. See seekFirst. - Return codes db_nfound - there are no records in the database db_range - query value was not in the valid range db_noquery - query value was not valid - Example - see begin ffldbError get(int idxNumber=0) 10_______________________________________________________Interface_Functions_ - Description Retrieve a record from the database using explicit values for an index. This is the only retrieval function which does not need to be executed within a begin/end pair. The function issues an implicit begin/end pair around this atomic opera- tion. - Return codes db_ok - Record matching index value was found db_nfound - Record was not found db_range - the index specified was illegal - Example // // relation host_to_ip - // char hostname[100]; // char ip[16]; // index byIp on ip; // index byName on hostname; // " host_to_ip myLookup; strcpy(myLookup.hostname,"dibbler.cs.monash.edu.au"); if (myLoopkup.get(dbIdx_byName)==db_ok) - cout << "IP=" << myLoopkup.ip << endl; " else - cout << "Not found" << endl; " ffldbError last(void) - Description The last function may only be executed from within a query (see begin). It sets the query pointer to the last record and returns the record at that position if one exists. - Return codes db_ok - last record found and retrieved db_nfound - there were no records db_noquery - Last must be within a begin/end pair - Example - see prev DiamondBase_____________________________________________________________11_ ffldbError next (void) - Description This function may only be executed from within a query (see begin). It sets the query pointer to the next relation for the current query index and retrieves it if it exists. - Return codes db_ok - Next record is retrieved db_eof - No more following records db_noquery - Next was not executed within a begin/end pair db_locked - data retrieved but record locked by someone else - Example -see begin ffldbError peekNext(void) - Description Return next record if it exists, but do not advance the btree query pointer. Subsequent next, prev, peekNext and peekPrev calls will return values as if this call never occurred. - Return codes db_ok - Next record is read db_eof - No more following records db_noquery - Next was not executed within a begin/end pair db_range - query value was not in the valid range db_locked - data is retrieved but record is locked - Example // Given the records, // JONES,PERCY,PLATT and SMITH employee tempEmp; strcpy(tempEmp.name,"PLATT"); tempEmp.begin(dbIdx_name); // Initiate query tempEmp.find(); // Fetches PLATT tempEmp.peekNext(); cout << tempEmp.name << endl; // Prints SMITH 12_______________________________________________________Interface_Functions_ tempEmp.next(); cout << tempEmp.name << endl; // Prints SMITH again tempEmp.end(); ffldbError peekPrev(void) - Description Return previous record if it exists, but do not move the btree query pointer. Subsequent next, prev, peekNext and peekPrev calls will return values as if this call never occurred. - Return codes db_ok - Previous record is read db_eof - No more preceding records db_noquery - prev was not executed within a begin/end pair db_range - query value was not in the valid range db_locked - data is retrieved but record is locked - Example // Given the records, // JONES,PERCY,PLATT and SMITH employee tempEmp; strcpy(tempEmp.name,"PLATT"); tempEmp.begin(dbIdx_name); // Initiate query tempEmp.find(); // Fetches PLATT // NB query pointer is now after PLATT tempEmp.peekPrev(); cout << tempEmp.name << endl; // Prints PLATT ! tempEmp.next(); cout << tempEmp.name << endl; // Prints SMITH tempEmp.end(); ffldbError prev(void) - Description DiamondBase_____________________________________________________________13_ This function may only be executed from within a query (see begin). It sets the query pointer to the previous relation for the current query index and retrieves it if it exists. - Return codes db_ok - Previous record is retrieved db_eof - No more previous records db_noquery - Prev was not executed within a begin/end pair db_locked - data retrieved but record locked by someone else - Example // Retrieve all employees' names - but in reverse // order. employee tempEmp; tempEmp.begin(dbIdx_name); // Initiate query for(tempEmp.last();tempEmp.ok();tempEmp.prev()) - cout << tempEmp.name << endl; " tempEmp.end(); ffldbError put(int idxNumber=0) - Description This function writes a record into the database regardless of its presence. If the record does not exist, this call does an add, otherwise it does a write. This call, like get does not require a query and, as such, contains an implicit begin/end pair. It can be used to write a record into the relation with no regard to the current contents. - Return codes db_ok - Record was successfully written db_dup - The record's keys partially clash with existing records db_range - the index specified was illegal - Example // relation host_to_ip - // char hostname[100]; // char ip[16]; 14_______________________________________________________Interface_Functions_ // index byIp on ip; // index byName on hostname; // " host_to_ip myLookup; strcpy(myLookup.hostname,"dibbler.cs.monash.edu.au"); strcpy(myLookup.ip,"130.194.62.33"); if (myLoopkup.put(dbIdx_byName)==db_ok) - cout << "Added successfullly" << endl; " else - cout << "Something went wrong" << endl; " ffldbError seek(void) - Description Positions the query pointer using the values for the current query index. The next record returned by a call to next or peekNext will be the record requested if it exists in the rela- tion, or the one that would follow it if it did exist. - Return codes db_ok - The specific record was found. db_eof - No more preceding records db_noquery - prev was not executed within a begin/end pair db_range - query value was not in the valid range db_nfound - the specific record was not found. - Example fflseekFirst - Description Position the query pointer just before the first record for the current query index, so that the next record retrieved will be the first one. Like first, this call does not depend on the contents of the relation - it only moves to the beginning of the index. - Return codes - db_ok - The record was found. - db_eof - No records - db_noquery - seekFirst was not executed within a begin/end pair DiamondBase_____________________________________________________________15_ - db_range - query value was not in the valid range - Example ffldbError seekLast(void) - Description Position the query pointer just after the last record for the current query index, so that calling prev will retrieve the last record for that index. - Return codes - db_ok - The record was found. - db_eof - No records - db_noquery - seekLast was not executed within a begin/end pair - db_range - query value was not in the valid range - Example ffldbError write(void) - Description If all the keys are already present in the database and they all refer to the same record, then that record will be overwritten with the values in this class. If all the keys are not present then the call fails. The standard way to update a record is to perform an extract followed by a write. If you wish to change one of the fields on which the record is indexed, then you must create a new record and copy the unchanged details from the old record to the new - followed by a deletion of the old record. This decision was made to considerably simplify the update mechanism within the database. If the write is being performed after an extract, then the keys are checked to ensure that they have not changed since the extract. If they have, then the write will fail. This last check is currently unimplemented. - Return codes db_ok - record was overwritten successfully db_dup - neither 0 nor all the indices matched db_range - query value was not in the valid range db_noquery - query value was not valid db_nfound - record is not already in the index - Example 16_______________________________________________________Interface_Functions_ 2.1.3 Data Members status Holds the return code from the last database operation. Error codes: db_ok .... DiamondBase_____________________________________________________________17_ 3 Internals 3.1 Introduction As far as the big picture goes, you write a program which needs database func- tions. You have used DSC to produce both the physical storage files, and some compiler generated code which you link into your application. You also link in the diamond library which contains functions needed to perform the database operations you execute as methods to DSC generated objects. This section of the manual describes how the internals of DiamondBase are implemented. We would like you to understand what is happening inside the database - as this will aid you in debugging, reporting bugs within the database code and suggesting improvements and changes. 3.2 Typical application code #include "myrel.h" diamondBase theDiamondBase("config.db"); main() - myRel theRel; // Attach stage (a) theRel.begin(); theRel.first(); // Use theRel here. theRel.extract(); // Decide to update it. // Change some non-indexed member fields theRel.write(); // Put it back theRel.end(); " Note the appalling lack of error checking going on - utterly disgusting. The definition for the 'myRel' class will inherit both a transfer area and a diaRel* * class. The transfer area acts as a sub-struct with only data members which can be used for memory transfers as records are stored and retrieved. The diaRel base class has the necessary methods to implement the calls (eg first, next, write). Note that in the above example, the myRel instance was in scope for all of main - it need not have been. You can declare relations wherever you wish to 18_________________________________________________________________Internals_ use them. Global relation instances are even permitted, provided you can ensure that the theDiamondBase instance you declare is constructed first. Which brings me to the final point for this section - you must have an inst* *ance of diamondBase called `theDiamondBase' at the global scope in your program. We could have defined one ourselves in the library, but this way you get to name the database configuration file. 3.2.1 diaRel There is one instance of diaRel in each user-defined relation class. It imple- ments the functions which are used to manipulate the database. diarel.h and diarel.cc are fairly self explanatory. The constructor for diaRel attaches to t* *he global theDiamondBase object and obtains a reference Id which is used in all further correspondance up until the point where the class which inherits diaRel has been destroyed. diaRel also inherits an object class. This has a number of pure virtual met* *hods which the database ultimately uses to perform functions like fetching the data, storing the data, extracting the key component of the data - comparing keys. These functions are performed by DSC generated code and so a pointer to the base object class is passed to theDiamondBase during attach in order to permit this. 3.2.2 diaGRel The diaGRel (or Generic Relation) class can be used in place of a DSC generated class to manipulate any relation you desire. This can be used to write generic relation browsers or such like. The use of the diaGRel class is described in de* *tail in section 4. diaGRel is used to implement the db2txt relation dumping utility, and the server in the multi-user version. 3.2.3 diamondBase diamondBase has rather a large job. It overseas all the classes which actually manipulate the database. On construction it is supplied with a string which is a path to the name server configuration file. This file is typically called "conf* *ig.db" - although the default parameter is "ns.dbc". This parameter is passed to the base TNameServer class. As mentioned in the diaRel section, diamondBase handles incoming diaRel connection requests - assigning a reference id to each new connection. A list of such class clients is maintained. A list of active and recently deregistered relations is kept. Old relations* * are kept to avoid unnecessary overhead when a relation class is repeatedly created then destroyed. If multiple active diaRel classes are accessing the same relati* *on, then only one relation is maintained for all of them. DiamondBase_____________________________________________________________19_ Each diaRel uses only one relation - and provides its name upon registration. Failure to locate that database name using TNameServer is an error. Each relation has an associated pointer to a dbObj through which all non register/deregister operations are handled. 3.2.4 TNameServer This class uses the parameter supplied at construction time to open a file con- taining entries in the form: relationName=directory-path This is simply used to provide some flexibility in the mappings from a relation name onto the directory where it is actually stored. As an alternative, it is possible to pass the empty string ("") to diamondBa* *se on construction and then the TNameServer will assume that all relations are in the current directory. This is useful when using diaGRel. 3.2.5 dbObj One dbObj instance handles one relation. For each index on that relation, the dbObj maintains one bTree instance through which operations using that index are performed. It inherits an recServer base class which it uses to fetch the actual data associated with a particular record. Actual operations on indices a* *re perfromed by calling the appropriate method in a bTree. The dbObj maintains a set of active queries which each refer to a particular bTree query. Record locking occurs at the dbObj level to prevent two updates occurring on the one record concurrently. The dbObj also collects information about the relation when it is first inst* *an- tiated. This allows the diaGRel to find out all the information it needs about a relation. 3.2.6 bTree The bTree is probably the most complicated object in the whole hierarchy (histo* *r- ically it is was the first thing to be written). It manages the indexing of rec* *ords. It manages its own queries, taking care to ensure that multiple queries reflect* * the changes made to each other, whilst optimizing as much as possible for speed. It inherits a recServer to store the actual buckets. The actual implementati* *on is essentially a B+ tree with pointers to allow sequential retrieval. At the mo* *ment, buckets which drop to less than half full aren't merged with sibling buckets. 20_________________________________________________________________Internals_ Operations like fetching key data from the user class, setting key data, co* *m- paring keys etc, are performed via calls to member functions in the object class that is passed in. DiamondBase_____________________________________________________________21_ 4 Generic Relations 4.1 Introduction The fact that DiamondBase is based upon the DSC relation compiler meant initially that only those relations that had code compiled into an executable could be accessed by that executable. This was seen as being too restrictive because it meant that no programs could be written that acted upon an arbitrary relation. Since it was intended to eventually write such programs as relation browsers and index rebuilders, it was vital that a mechanism for accessing unknown relations be written. 4.2 diaGRel The diaGRel class accomplishes this. It is meant to be used in any situation wh* *ere a DSC generated class would otherwise be used. It implements all functions that the generated code does except that it is less efficient. This class is instantiated as a DSC generated class, except that its constru* *ctor takes a string as a parameter which specifies the relation to use. This should * *be the name of the files which make up the relation without their file extension. 4.3 Use Once instantiated, a diaGRel is used exactly as any DSC generated class is. 4.4 Implementation Initially, the diaGRel was written to open the relation file and read in the re* *cord and key structure stored there. It would then do some pretty horrendous traver- sals of this data to build up arrays of information about the relation. These arrays contain data in a form that makes implementing the methods required to imitate a DSC generated class quick and efficient. This process is complicated * *by the fact that relations and keys are reordered by DSC to put them in an order that prevents alignment problems. Key access in a diaGRel is the main ineffciency of the class. This is because much pointer arithmetic has to be done at runtime to extract relevant data. In a DSC class this is done at compile time by the compiler by way of structure references. Access to the relation data can be via one of two mechanisms. Either the diaGRel allocates a block of memory or a block of memory is given to it as construction time. The latter has the benefit that you can inherit a stuct that* * is in the relation and pass a cast to this to diaGRel thus allowing you to referen* *ce 22_________________________________________________________Generic_Relations_ relation members by name. Without this, you only have access to the relation as a block of memory. struct BookRel - char name[100]; char author[100]; long barcode; " class Book : public BookRel, public diaGRel - public: Book(void); " Book::Book(void) : diaGRel("book",(BookRel *)this); - " diamondBase theDiamondBase("config.db"); main() - Book ARM; strncpy(ARM.name,"Annotated C++ reference",sizeof ARM.name); strncpy(ARM.author,"B. Stroustroup",sizeof ARM.author); ARM.barcode = 11345654; ARM.put(); " 4.5 Performance The above implementation worked correctly but inefficiently. The reason for this was the necessity to open, read and disect the relation file on every instantia* *tion. This meant that a process using a diaGRel instead of a DSC generated class took more than 5 times as much system time to perform the same job. This was not satisfactory. The solution to the above was to allow the dbObj to collect the data. This * *was already being partially done there and the change involved setting up a struct * *to hold all the data and moving a lot of code from the one class to the other. Now DiamondBase_____________________________________________________________23_ the dbObj can return a pointer to this struct to the diaGRel via the diamondBas* *e, through the attach method to the diaRel underneath the diaGRel (phew!). The above change meant that unless more than about 10 different relations are used in a program, the data discussed above is assembled only once. The result is that the diaGRel takes about twice as much user time as the DSC generated classes (about expected) and about 35-40% more system time (also not a disappointing number). This raises interesting performance questions for database engine writers. U* *n- less you either employ a system whereby you write self-modifying code, or you link in generated code (as we do) - you are are forced to accept this overhead * *due to interpreting your relation layout. It basically involves extra levels of ind* *irec- tion during database access. One could envisage a system whereby the database engine generates custom machine code in designated parts of its executable for optimally handling each relation, and then loads it in when that relation is ne* *eded - but the mess associated with that system is not warranted at this stage. 24____________________________________________________Variable_Length_Fields_ 5 Variable Length Fields 5.1 Introduction A noted absence in early versions of DiamondBase was variable length records in some form or another. Given the complexities of implementing a variable length record server, we opted to instead provide variable length fields. This solution satisfies almost* * all sensible known uses of length variation in a relation. 5.2 dbData To this end, two variable length field types are now available: dbData and dbString. The dbString type is derived from dbData and so I will discuss dbData first in detail. dbData is a type that can be used to start arbitrary (possibly binary) data of variable size. The following access methods are available. fflconstructors - dbData: Construct from another dbData. - char*: Copy the string and make the size of the dbData equal to 1 more than the string length. - void*,long: Make the dbData the specified length and copy that much data in. - void: Just creates an empty dbData. fflunsigned int len(void): Returns the current length of the dbData object. fflunsigned int setSize(unsigned int): Sets the size of the dbData object a* *nd returns the size set. The result of this method differs from its paramet* *er only in dbString cases, see below. fflunsigned int getSize(void): Returns the current size of the object. No* *te that for dbString, size and length are different. fflchar* cat(char *): Adds a string onto the end of a dbData object. Returns a char* pointer to the data in hte dbData. fflchar* cat(dbData&): Similar but adds the contents of the dbData passed in. fflchar* cpy(char*): DiamondBase_____________________________________________________________25_ fflchar* cpy(dbData&): These two replace the contents of the dbData with the relavent data. fflvoid fill(char, unsigned int=0): Fills the dbData with the specified char* *ac- ter. If the second argument is passed then that many characters are filled in, otherwise the entire dbData is filled. fflvoid clr(void): fflvoid dispose(void): These both free any memory associated with the dbData and set it's content and length to 0. ffloperator char*: ffloperator void*: These return appropriate casts of the dbData object. ffloperator +(char*): ffloperator +(dbData&): These add the contents of their second operand onto the end of their first. ffloperator <<: ffloperator >>: Perform the appropriate stream operations. ffloperator []: Returns a reference to the appropriate char in the object or* * a reference to a dummy char if the paramter is out of range. ffloperator =(char*): ffloperator =(dbData&): These behave exactly as cpy. ffloperator +=(char*): ffloperator +=(dbData&): These behave exactly as cat. 5.3 dbString The dbString class was orginally written (and rewritten a few times) as the only variable length type. When the more general dbData class was designed it became clear that dbString should be a derived class from dbData. This is now how it works. dbString behaves almost exactly as dbData does except that it is a null ter- minated string. This means that some of the resizing functions behave slightly differently. In particular, the size of a dbString and the length of a dbString* * are different things. The size is the amount of memory allocated to the internal ch* *ar array and the length is the number of characters before the first null characte* *r. 26____________________________________________________Variable_Length_Fields_ The difference between size and length can be used to ensure that there is enough space in a dbString to perform an operation. For instance, you may use setSize to increase the size of a dbString before passing a char* cast of it to* * a function which manipulates it (eg the library sprintf function). In this way, y* *ou know there is enough space in the string to extend its length. The following are the differences in dbString. fflThe following functions behave differently either by using a differnet d* *efi- nition of length or by storing a 0 to truncate the string during size cha* *nges or adding (or otherwise taking into account) the extra 0 needed on the end of a string. - unsigned int len(void) - void setSize(unsigned int) - char* cat() - char* cpy() - void fill(char c, unsigned int = 0) - constructors: by passing a flag to dbData to indicate it is a string 5.4 memServer The memServer is the class that allows the variable length fields to be used wi* *th diamondBase. It does this by creating a new file with a .str extension to hold the dbString and dbData records. The memServer file is laid out like a block of memory being manipulated by an allocator such as malloc. It manipulates used and empty blocks and joins empty blocks should they lay adjacent to each other. Blocks are alocated on a first-fit basis. In order to able to access these fields normally in a relation, they must be members of the relation. In order to make it practical to store the relation in a recServer, only longs are stored for each dbString or dbData. These longs are offsets into the .str file maintained by the memServer. It is therefor necessar* *y to have those longs stored in the relation structure so it can be written out effi* *ciently by the recServer. The solution to the above dilema was to declare another struct to hold the dbString and dbData members with the same names as given in the relation file. The long offsets are stored in the main relation struct with slightly man* *gled names. Both of these structs are inherited by the relation. In this way, users may reference their dbString and dbData references as th* *ey are named even though only longs are stored in the relation struct. The dbObject makes sure that whenever there are dbString or dbData fields in a relation, appropriate calls are made to memServer to keep them up to date. DiamondBase_____________________________________________________________27_ Every time a record is read or written, the strings need to be similarly update* *d. This gives a performance penalty compared to using char arrays to store strings but this penalty has been optimised heavily to minimise this penalty. The memServer and its integration into the dbObject provide seamless use of dbData and dbString fields in relations. 28_________________________________________________Multiuser_DiamondBase___ 6 Multiuser DiamondBase 6.0.1 Disclaimer Warning: The multi user component of DiamondBase has only been completed very recently, and is being released as an ALPHA version. There are many defi- ciencies documented below (and goodness knows how many undocumented ones). If you intend to use it, then read this carefully, and please give us feedback * *on what you think, and any improvements you think are necessary. 6.1 Introduction The transition from single user to the multi-user version of DiamondBase has been quite an adventure. We started by defining what we meant by "multi-user". To an extent, the existing version already supports a level of concurrency in t* *hat you can have multiple objects in your program which are all accessing the one relation. There is proper locking to ensure that they don't interfere with one another. At a higher level however, you want multiple processes which may not even be the same program, accessing the database concurrently without corrupting the underlying files. For this version, we have chosen to implement a system which allows multiple processes on the one machine. The possibility exists for a distributed database across a TCP network but that shall wait. 6.2 What didn't work The first attempt at a concurrent version of DiamondBase began at the bottom. If the record management at the recServer level was given proper locking, then all that sat on top of would behave transparently. This turned out to be about * *as incorrect as is imaginable. For a start, cache coherence becomes a problem. The* *re are also higher level structures like B-Trees, for which, simply locking a smal* *l part is totally inappropriate. As the changes propagated up the code hierarchy and the changes to the bottom layer multiplied, this was abandoned. Andy wisely smiled, having predicted this from the start. Having explored its "rear end" if you like, we next commenced with oral surgery. Putting a layer near the top, between the schema generated code and the database engine would allow the two to exist in separate applications quite happily. This worked much more nicely, with the engine left unchanged, and some surgery at the level of the diamondBase and diaRel objects. There was a problem however. Having all the comparison code compiled into the application makes for a very fast single user version (and I would still recommend it unless you really need concurrent behavior). However, it meant that the server still DiamondBase_____________________________________________________________29_ needed to convey the object callbacks to the application with the appropriate data. After quite a bit of coding, and the odd kernel IPC fix or two for linux, we had a running version which was only 11 times slower than the uni version. Disgusted, I decided that for this, and other reasons, we would have to interpr* *et the relation layout and do the comparisons within the server, only talking to t* *he application when it issued a request, and to supply the reply. The diaGRel class was born to permit generalized access so the server could get to any relation, * *and the communications protocol emasculated to handle the smaller command set. 6.3 The architecture This version uses shared memory segments between the server and the application processes to transfer records, and a message queue shared between all processes* * to communicate requests and intentions. Incoming messages on a very low message number signify attach requests from applications. The connection protocol is as follows: fflClient sends an attach request as a message with id 2, and its pid. fflServer creates /tmp/diamond.pid and a shared memory segment to go with it. It then sends a message on BASE_RESP + pid fflClient attaches to the new shared memory area. Further transactions are performed as follows: fflCopy the relations data into the shared memory segment fflSend message with transaction details on BASE_REQ. Initial versions had pid added to this but this caused clients with lower pid's to be given pre* *f- erential treatment. Now the pid is embodied in the message. fflServer reads message and gets the database engine to perform the trans- action after copying the data from the shared area into a diaGRel object which is carrying out the database requests on behalf of the client. fflServer copies resultant relation into transfer area and sends reply messa* *ge on BASE_RESP + pid Transactions consist of database transaction ids along with associated index numbers, new object attach requests, object detach notification messages and ap- plication detach messages. The reference Id numbers handed out by the database engine are mapped onto "fake" diaGRel objects which do all the work in the en- gine on behalf of the client's diaRel objects. Two lists are maintained to tra* *ck the attached client applications and their attached objects. Having the diaGRel functionality makes the whole process quite simple. 30_________________________________________________Multiuser_DiamondBase___ 6.4 Performance Was not terribly disappointing. It was never going to be faster than the uni version, but you have to keep in mind some potential gains from having a separa* *te server: fflKilling application programs won't jeopardize database integrity fflRetrieved data is in a single pooled cache increasing efficiency fflThe possibility exists to prevent direct access to the database files. ___________________________________ _ Test _User _System _ Total _ ___________________________________ _ _ Uni _27.9 _ 8.0 _ 35.9 _ ____________________________________ _ Multi (1) _ _ _ _ _ _ _ _ _ _ server _23.19 _ 11.8 _50.19 _ _ _ _ _ _ _ client _10.0 _5.2 _ _ ____________________________________ _ Multi (2) _ _ _ _ _ _ _ _ _ _ Server _49.0 _ 22.3 _104.23 _ _ _ _ _ _ _ client A _11.0 _22.3 _ _ _ _ _ _ _ _ client B1_0.83 _ 5.5 _ _ ____________________________________ Table 1: Performance figures for multi and uni versions From the above figures, the multi version is approximately 40% slower than the uni version - which can largely be attributed to the fact that the server i* *s in- terpreting the record comparisons. Note that for the two client test, the datab* *ase is essentially doing twice the work. Running the two tests under the uni version takes twice the figure quoted for the uni version. If you really want high per- formance concurrent database management for specialized applications, it would be theoretically possible to compile the comparison code into a specialized ser* *ver which can just do the record comparisons you need. The other thing to note on the performance front is that we have not opti- mized the code at all (ie -O6) - there is reason to believe that this could imp* *rove things. We have not profiled the multi stuff at all, so there are many potential improvements in performance there. The next release will be a little more fleet footed on that front I expect. 6.5 Compiling the Concurrent version The target multi exists in the Makefile. Typing gmake multi after you make already built the main system will perform the multi compilation. The target DiamondBase_____________________________________________________________31_ will build the dbmulti library, and the jeweler server process. It has been tes* *ted under SunOS 4.1.1, Ultrix IRIX and Linux. There are some prototypes for the IPC system calls which seem to be woefully under prototyped on non Linux systems. It should theoretically work on any UN*X system which supports IPC (interprocess communication) - although my experience so far suggests that IPC is about as standard as curses so I am prepared for bug reports. 6.6 Testing the concurrent version I will distribute a small application called bump. It creates 5000 records, int* *er- spersing requests for old records in the addition calls, and ensuring their int* *egrity. It takes a parameter indicating a range. If you are running more than one, you should specify parameters about 10,000 apart. I have run up to 8 of them con- currently under SunOS, and Linux. Note that if you are running more than one - you should provide parameters spaced at 10000 and run no test twice without first creating an empty database, otherwise duplicate records will result and t* *he test code will fail. Running more than 9 processes at once causes the tests to * *fail due to limits within the server (which can be changed if desired). 6.7 Using the Concurrent version One of the issues we would still like to address more thoroughly is that of er- gonomics from a programming point of view. Ideally, we would like to minimize the pain of the transition between a uni-database and a concurrent one, to that* * of linking with a separate library. We still believe that this can be achieved, bu* *t at the moment, it is necessary to recompile your application with the preprocessor symbol 'MULTI' defined. You must also link with the libraries dbmulti and dia- mond in that order, rather than just diamond. Upon execution, your application will attempt to connect to a running server in order to perform its transactions rather than doing the accesses directly. So, to reiterate the steps: fflMake sure that your code includes diarel.h rather than dbase.h to get at any database definitions. fflPut -DMULTI into your application CFLAGS fflLink your application with -ldbmulti -ldiamond fflEnsure that the server process jeweler is running. Note: This version of the code creates all rendezvous points with octal per- mission 600, meaning that the client and server must be run by the same user. The permissions can be edited to suit your taste. The whole area of security management needs careful examination. 32_________________________________________________Multiuser_DiamondBase___ The ergonomics of files used to manage the connection are also less than satisfactory at the moment. Any relation which you wish to use via a server process must be in the server.db file in the directory in which the server proc* *ess jeweler is run. This file is in the same format as the relation file whose name* * is normally passed to the diamondBase object in the uni-version. The server and client processes initially make their rendezvous via message passing, and the filename used to make the connection is gibraltar. (It seemed * *like a solid place to build a database - sorry). Currently, both the server and clie* *nt library code look in the current directory for this file. It must therefore ex* *ist - although its contents are irrelevant for the running of the server. The server * *itself creates the file when it is run. It also constrains the clients to run from the* * same directory as the server. This is messy and we will endeavor to rectify it. If y* *ou have strong opinions on how databases, servers and clients should be managed system-wide, we would love to hear from you. So, in order to run an application/server pair: fflcreate a file server.db listing your relations fflrun jeweler (optionally in the background) fflrun your application in the same directory When you wish to shut the server down, either send it an interrupt message (press control-C), or kill -SIGTERM the process. The server process will output some messages on either stdout or stderr, so you may wish to log these to a fil* *e - they contain some useful information like diagnostics when relation names can't be looked up. If there are clients still attached, then you will have to wait until those* * clients are gone before the server will shut itself down. 6.8 Caveats - PLEASE READ THIS CAREFULLY. This is the section where I warn you about the nasty issues/problems which are still out there and not dealt with. As mentioned above, there are many ergonomic issues which we will address as we write some more concurrent applications. I will list some other problems which I am aware of. I will try to fix the more serious ones at least - but there are just so many little potential problems th* *at this may take a while. There has been some pressure to get this out and testable as soon as possible. My little list of problems follows: fflVariable length strings and memory aren't supported yet. [I expect to ha* *ve these working very soon. Kev] fflThere is no configuration management, utilities to shut the server down, or monitor it etc. Changes to the server.db file won't take place until i* *t is DiamondBase_____________________________________________________________33_ restarted. At the moment, sending a suggestive signal like SIGTERM or SIGHUP will cause it to exit when convenient. fflThere is a static limit on the size of relation which can be handled (4K * *at the moment). Getting around this efficiently is a little sticky. fflThe rendezvous point between client and server is clumsy and essentially means they need to be run in the same directory. Configuration at in- stallation time is one possibility, or at runtime which is probably more acceptable. fflThe schema compiler (DSC) is still a uni application, so it can blow away the database from under the server's feet if you are stupid enough. This shouldn't be permitted. fflNon-vital commands in diaRel like verStr aren't implemented yet. fflDeath of server will freeze the application using it. fflSecurity is reliant on protecting the gibraltar file, and the code opens * *the IPC services as 600 currently, so only processes under the same uid as the server can access it. This needs to be greatly improved. Ideally, client instances should not be able to interfere with one another. We may have to resort to encryption to achieve this and there will be performance penalti* *es. Denial of service attacks are also possible. Just remember that if I have missed something, Use The Source Luke. The files gib.cc and dclient.cc contain the relevant source for the server and clie* *nt library respectively. They aren't too complicated. 6.9 Still to do fflReduce the size of the caveat list. fflProfile the code to remove bottlenecks fflReplacing the linked lists with hash tables for the client and object lis* *ts. fflUsing semaphores instead of message queues for efficiency fflAn OS/2 port. fflMore thorough test suite. fflAdd dbString support. fflPort/test on other unix platforms. fflGo to bed. 34_________________________________________________________DSC_Users_Guide_ 7 DSC Users Guide The DiamondBase Schema Compiler (DSC) is a utility provided with the Di- amondBase database package which takes a lot of the hard work out of writing database applications. It does this by generating most of the code required to maintain the database automatically. In addition DSC generates new, empty databases for each schema you specify. Throughout this section and the next, it is expected that the reader has a good understanding of the concepts involved in relational database design[1]. 7.1 A Simple Record Library Let us begin with a small example. Consider the application of producing a database for storing record library information. One part of such a database would be a relation which stores information about a single artist. The informa- tion we would like to store is the artist's name and the artist's abbreviation.* * The specification of this relation is given below. relation Artist f // Field specification. char artName[100]; // Artist's Name. char abbr[10]; // Artist's Abbreviations. long titles = 0; // Number of titles. unique artId; // A unique number to identify // this artist by. // Index specification index name on artName; // An index for artist names index abbrev on abbr; // An index for artist // abbreviations. // Constructors construct using titles,artName index name; construct titles; g To make starting your DiamondBase application as easy as possible, the specification of a schema is based on the specification of data structures in t* *he C or C++ languages. As such some of the schema will be obvious, however before continuing we will look at the important parts it. DiamondBase_____________________________________________________________35_ fflrelation Artist The first word here is a keyword, indicating that this is the start of a new relation specification. The second is the name for the relation. This will beco* *me the name of the class associated with this relation, thus becoming a type name.* * If you wanted to use "Author" as a variable name in your program, consider calling the relation something else in the schema. Note also the opening "brace" (`f'). The information belonging to this relat* *ion is contained within these braces. You must finish the relation with a closing "brace" (`g'). There is no limit to the number of relations which can appear in a single schema file. Each must begin with the keyword relation and must be enclosed within braces. There can be an optional "called name" after the relation and before the brace. It allows the relation to have a different name to the underlying struct* *ure. 7.1.1 Field Specification fflchar artName[100]; Now we begin the business of specifying the fields for the relation. First t* *he type of the field is given, in this case a character (or as we shall see, an ar* *ray or characters). Next a name must be given to the field. In this case the name is "artName". You will be able to access a class member called "artName" to manipulate this data in the programs you write. Lastly, you have the option of specifying that more than one "type" (in this case character) should be allocated. This is the syntax for specifying an array* * of data items, and in the case of characters is also the method for declaring stri* *ngs. Default arguments can be given using an = sign. These are assigned to the relation during construction. The valid types for use in a relation are given in the DSC Reference Manual. Note that each field specification must be terminated by a semi-colon (`;'). fflunique authId; The unique type is special, in that it is never assigned a value by Diamond- Base applications. Instead DiamondBase assigns a value to these fields when a new record is created, guaranteeing that the value it gives is unique for that relation at that time. It is important to note that these unique numbers are only instantaneously unique. If a record is deleted from the relation, the unique value it once had * *is then free to be re-used. 36_________________________________________________________DSC_Users_Guide_ 7.1.2 Index Specification Indicies are used as mechanisms for locating records within the relation based on some sorting field or key. As such their creation does not involve allocati* *ng any additional storage within a relation. Instead DSC generates the appropriate code to generate any keys which are required dynamically. The examples of index specification above are similar, so the first will be used as an example. fflindex name on artName; The first word is the index keyword. This is followed by an identifier. Thi* *s is the name of the index and can be used in application code when specifying which index should be used for record queries 1. The name of the index is followed by another keyword on, and then a list of one or more identifiers, seperated by commas. These identifiers are the names of the fields which should be used as sort keys for the index and should be specif* *ied in the order of importance. For example, index myindex on myfield1, myfield2; would cause a b-tree to be created with myfield1 as a primary sorting key, and myfield2 as a secondary sort key. 7.1.3 Constructor Specification Constructors may be declared for your relation. You should make sure that you do not create ambiguous constructors. Your compiler will give an error if you do. You should also not use a constructor that takes a char* and a long in that order. Such a constructor exists already so that you can construct using a key and an index number. fflconstruct using titles,artName index name; This creates a constructor that accepts a long and char* and then does a get using the name index. The "index name" portion is optional. 7.2 Compiling a Schema After creating a schema you will want to compile it using DSC. This process may generate a number of files, which by default will be created in the current directory. For large databases which is inconvenient from a programming point of view, if not simply messy. So as an example, you might set up a directory structure in your project path which had subdirectories called schema (for stor* *ing schema descriptions) and dbfiles (for storing the generated database files). ______________________________ 1In fact the name which is made available to application programmers if pref* *ixed by "dbIdx_", so that the index above would be refered to in code as "dbIdx_name". DiamondBase_____________________________________________________________37_ Your application source files might go in the root directory of your project path, while schemas are located in the schema path, and the database files themselves (containing the data) are stored in the dbfiles path. Let us assume that such a directory structure exists, and that the schema file used in section 7.1 is in the schema path and is called artist.ds (the .ds extension denotes a DiamondBase Schema file)2. To compile this schema and place the database files in the dbfiles path, we use the following command : Specify the database file path z_____"______- dsc I_schema_-z____" O dbfiles artist___-z__" Specify the schema path The schema filename This could generate 3 source files in the current path called artist.h, arti* *st.cc and artist_s.h. It then may create a new data file for each relation in your schema (in this case only one), named after the relation which has a .db extension. This file is placed in the dbfiles path. Finally DSC may create one .id file for each index specified in the schema. This extension is suffixed by a number from 0 to 9, using a different number for each index specified in the schema. These index files will also be located in t* *he dbfiles path. The database files will only be generated if you use the -D flag and the code described above will only be generated if you specify the -C flag. This ensures that you do not overwrite anything. 7.3 Derived relations It is often the case that you would want to inherit a DSC relation into another class so that you may augment its functionality. To help with this, there is another command line argument for DSC. ffldsc -G base derived file This will create "file.h" and "file.cc" files containing a class named deriv* *ed which has a base class base. The class will appear as a skeleton class based on your base class. 7.4 Getting on with the Application You now have everything that is required to start building your own application. There are a few more command line options which may become useful down the track as your application develops. These are explained in detail in the next section. ______________________________ 2schema files should have the extension ".ds" for clarity 38_____________________________________________________DSC_Reference_Guide_ 8 DSC Reference Guide 8.1 Command Line Arguments The command syntax for DSC is as follows : dsc [options] <schema name> The command line arguments to DSC are shown in figure 1. If no extension is given for the schema file, the extension .ds is used. _______________________________________________________________ _ Option _Action _ _________________________________________________________________ _ -C _Create generated source code _ _ _ _ _ -D _Create database files _ _ _ _ _ -G base derived fileG_enerate a derived class _ _ _ _ _ -O < path > _ Place generated database files in < path > _ _ _ _ _ -S < path > _Place generated source code in < path > _ _ _ _ _ -I < path > _Use schema files located in < path > _ _______________________________________________________________ _ Figure 1: Command-line options to DSC 8.2 DSC Data Types The types available in DSC are similar to those which are available in C++ with three exceptions, and several ommissions. Table 2 shows all the legal types in a DSC schema file, their C++ equivalents, their type identifiers and gives a short description of their properties. 8.3 DSC Syntax Definition The syntax for schema description is modelled on the C and C++ languages. There may be one or more relations descripted in a schema file, with each being preceeded by the relation phrase, and each enclosed in a pair or brackets (`f g* *'). Table 3 shows the complete syntax for DSC in BNF. NOTE This BNF is out of date. It is 3am and I have no plans to try to persuade LaTEXwhat the current BNF looks like. I suggest reading dsc.y. It makes for many hours of fun for the whole family. Note that the version below does not include default arguments and constructors (at least). DiamondBase_____________________________________________________________39_ ___________________________________________________________________________ _ DSC Type _ C++ Type _ Id _ Decsription _ _____________________________________________________________________________ _ long _long int 0_ _ A long integer. The size of this type _ _ _ _ _ _ _ _ _ _will vary, but is usually 32 bits. * * _ ___________________________________________________________________________ * * _ _ ulong _unsigned long int 1_ _ A long integer which may only _ _ _ _ _ _ _ _ _ _take positive values. _ ___________________________________________________________________________ _ _ short _short int 2_ _ A short integer. The size of this _ _ _ _ _ _ _ _ _ _type will vary between compilers. _ ___________________________________________________________________________ _ _ ushort _unsigned short int3_ _ A short integer which may only _ _ _ _ _ _ _ _ _ _contain positive values. _ ___________________________________________________________________________ _ _ double _double _4 _A double precision floating point _ _ _ _ _ _ _ _ _ _number. _ ____________________________________________________________________________ _ float f_loat 5_ _ A single precission floating point _ _ _ _ _ _ _ _ _ _number. _ ____________________________________________________________________________ _ money _moneyType _ 6 _ A type defined to store currency _ _ _ _ _ _ _ _ _ _information. _ ___________________________________________________________________________ _ _ date _dateType _7 _ A type defined to store date _ _ _ _ _ _ _ _ _ _information. _ ___________________________________________________________________________ _ _ char _char _8 _A character, or 8-bit number _ ___________________________________________________________________________ _ _ unique _uniqueType _9 _ A type whose value is guaranteed to be _ _ _ _ _ _ _ _ _ _instantaneously unique. _ ___________________________________________________________________________ _ _ dbString _dbString _10 _A resizable string class _ ___________________________________________________________________________ _ _ dbData _dbData _ 11 _ A generic resizable binary data object _ ___________________________________________________________________________ _ _ ichar c_har _12 _A case insensitive char _ ___________________________________________________________________________ _ Table 2: Valid types and their integer identifiers 40_____________________________________________________DSC_Reference_Guide_ _______________________________________________________ _ schemaFile ! schemaFile relation _ _ _ _ ! null _ _ _ _ _ _ _ _ relation ! relation ident structName fieldList _ _ _ _ _ _ _ _ structName ! [is] called ident _ _ _ _ ! null _ _ _ _ _ _ _ _ fieldList ! f fieldList1 indexList g _ _ _ _ _ _ _ _ fieldList1 ! fieldList1 type ident size ; _ _ _ _ ! null _ _ _ _ _ _ _ _ type ! short _ _ _ _ ! ushort _ _ _ _ ! long _ _ _ _ ! ulong _ _ _ _ ! float _ _ _ _ ! double _ _ _ _ ! char _ _ _ _ ! money _ _ _ _ ! date _ _ _ _ ! unique _ _ _ _ ! dbString _ _ _ _ ! dbData _ _ _ _ _ _ _ _ size ! [ number ] _ _ _ _ ! null _ _ _ _ _ _ _ _ indexList ! indexList indexSpec ; _ _ _ _ ! null _ _ _ _ _ _ _ _ indexSpec ! indexSpec2 , ident _ _ _ _ ! null _ ________________________________________________________ Table 3: BNF Syntax for DSC DiamondBase_____________________________________________________________41_ 8.4 File Formats DSC is responsible for creating only the header for the actual .db file associa* *ted with each relation - the creation of b-trees and storage of data in the .db fil* *e are delegated out to other parts of DiamondBase . Consequently this section deals only with the header information for the data file. 8.4.1 List Headers Each data file contains enough information to derive the structure of each rela* *tion, including the names of fields and the list of indicies. The information is brok* *en into two parts - the field information and the index information. This informat* *ion is encapsulated in two classes, the fieldList and indexList classes. These clas* *ses are given in figure 2 and figure 3. struct fieldList f int numFields; // The number of fields for this // relation fieldInfo fields; // A linked list of individual field // information fieldList() ffields=0;numFields=0;g // Initialise class members fieldList() fdelete fields;g // Destroy the list friend ostream& operator o (ostream&, fieldList&) friend ofstream& operator o (ofstream&, fieldList&) friend ifstream& operator AE (ifstream&, fieldList&) g Figure 2: Class description for the Field Lists As you will see these are only place keepers for the heads of two linked lis* *ts. Since they are both essentially the same only the first will be discussed. Firstly the number of entries in the linked list is stored to make reading t* *he list from a file easier. The only other data member is a pointer to the head of the list. Function-wise there is a constructor which simply sets it's data members to zero and a destructor which causes the memory allocated for the list to be dele* *ted. Finally there are three operators for input/output. The first is an inserti* *on operator for normal output streams. This just causes the information stored in the linked list to be placed on the stream given. The second insertion operator 42_____________________________________________________DSC_Reference_Guide_ struct indexList f int numIndicies; // The number of indicies for this // relation indexInfo indicies; // A linked list of individual index // information // Initialise class members indexList() findicies=0;numIndicies=0;g // Destroy the list indexList() fdelete indicies;g friend ostream& operator o (ostream&, indexList&) friend ofstream& operator o (ofstream&, indexList&) friend ifstream& operator AE (ifstream&, indexList&) g Figure 3: Class description for the Index List writes the binary data to an output file. Lastly there is an extraction operator for retreiving the binary information from an input file. 8.4.2 List Information The actual information for each field or index if stored in the structures shown in figure 4 and figure 5. There is little similarity between the two, so both m* *ust be discussed. fflThe fieldInfo Structure Field information is located in the fieldInfo structure. These are chained together in a linked list, the head of which is stored in the fieldList structu* *re. The structure contains the name of the field (as a character array), the ty* *pe of the data stored in that field (as an integer. The list of valid types and th* *eir identifiers can be found in section 8.2), the size of the type in bytes and a p* *ointer to the next fieldInfo structure in the list. The constructor for the class creates a new fieldInfo object with the values passed to it as parameters. The destructor deletes the space allocated for the field name and then dele* *tes the next fieldInfo structure in the list, if it exists. fflThe indexInfo Structure DiamondBase_____________________________________________________________43_ The indexInfo structure really only differes in the information it stores. T* *here is the index type (not currently used), an array of fields which are used in the index (listed in decreasing sort precedence), and a pointer to the next indexIn* *fo structure. 8.4.3 Database File Headers The structure of the file is shown in figure 6. struct fieldInfo f int fldType; int fldSize; char fldName; fieldInfo nextFld; fieldInfo(char name, int ty, int sz) f fldName = new char[strlen(name)+1]; strcpy(fldName,name); fldType = ty; fldSize = sz; nextFld = 0; g fieldInfo() f delete fldName; if (nextFld) delete nextFld; g friend ostream& operator o (ostream&, fieldInfo&); friend ofstream& operator o (ofstream&, fieldInfo&); g; Figure 4: Structures for storing field information 44_____________________________________________________DSC_Reference_Guide_ struct indexInfo f int idxType; int idxFields[MAX_FIELDS_IN_INDEX]; indexInfo nextIdx; indexInfo(int type, TIndicies indicies ) f idxType = type; for(int i=0;i<MAX_FIELDS_IN_INDEX;i++) f idxFields[i] = indicies[i]; g nextIdx = 0; g indexInfo() f if (nextIdx) delete nextIdx; g friend ostream& operator o (ostream&, indexInfo&); friend ofstream& operator o (ofstream&, indexInfo&); g; Figure 5: Structures for storing index information DiamondBase_____________________________________________________________45_ _______________________________ _ Size of header _ _ _ _ (4 bytes) _ _________________________________ _ fieldList struct _ _ _ _ (8 bytes) _ _______________________________ _ _ 1 or more fieldInfo structures _ _ _ _ (16 bytes each) _ _________________________________ _ indexList structure _ _ _ _ (8 bytes) _ _______________________________ _ _ 0 or more indexInfo structures _ _ _ _ (12 bytes each) _ _________________________________ _ Data records _ _ _ _ : _ _ _ _ : _ _ _ _ : _ _ _ _ : _ _______________________________ _ Figure 6: File structure for database files. 46______________________________________________________________File_formats_ 9 File formats 9.1 Database Config file This file just specifies the directory where the files related to a particular * *relation reside. It contains lines in the form: relationName=directory-path 9.2 recServer The record server is a generic block storage system. Upon creation, a record si* *ze is nominated, and space for 4096 records is allocated. The object inheriting a recServer can then request a new record id, store the record at that location or delete the record at that location. The actual layout of the file is documented in the next section. The file basically consists of a header followed by alternate bit pools and record stora* *ge areas. The file will expand in size to accomodate demand for records. It is currently unable to shrink as demand for records decrease. There are ways of implementing this but they are currently not used. When the database becomes 80% full, the number of free slots is increased by 25% or 1 - whichever is bigger. To keep the file size to a minimum, the record space after the last bitPool isn't grabbed from the file system until records a* *re actually stored there. Since the recServer is designed to be used as a storage facility for other * *pro- gram modules, it has an offset which can be supplied at construction to reserve a certain number of bytes at the beginning of the file for other purposes. 9.2.1 Header The header keeps track of the current status of the database. Namely the size of each record, the number of active records and the total storage capacity of the database in records. 9.2.2 Bit Pools Bitpools are essentially bitmaps which record which records are used in the fol- lowing record storage area. It also has a header to record how many bits are used. This allows a quick scan of bits to determine which record slots are empty in the following record storage area. Bitpools are currently a fixed size of 40* *96 slots - which are bit packed to occupy 512 bytes. DiamondBase_____________________________________________________________47_ 9.2.3 Record Pools The record pools are a section of the file which is 4096 * lengthOfRecord long. There is no flag on the actual record to indicate whether it is used or not - t* *his is simply a storage area. The preceding bitPool must be interrogated to determine the status of a particular bitPool. 9.2.4 bTree bucket layout The bTree is where the real magic in the database happens. For a detailed view - we suggest reading the comments in bucket.cc. The bTree consists of a header and a data area. The header has pointers to the previous, next and parent bucke* *ts as well as a flag to indicate interior or leaf node. Note that the parent poin* *ter may be incorrect if you were in the right hand side of the parent bucket when it split. These pointers are repaired during tree traversal for efficiency reasons* *. It also stores the number of active keys in the data area. The data area consists of alternating pointer key pairs with one more pointer than key. The pointers are long integers which either point to child buckets (in an interior node) or the record in the recServer at a leaf node. The key data is long word aligned to avoid nasty problems during key comparison. This makes the index storage larger than necessary but is faster. Intel architectures coul* *d be optimized as the processor is alignment friendly. The bTree inherits a recServer to store its actual buckets. There is a global header before the recServer header which records the unaligned length of the key and where the root bucket is. I refuse to describe the full bTree manipulation details. I can recommend Smith, "Data Structures and Algorithms" for a good discussion. If you want to know how ours works - read Kevin's code. If there are sufficient enquiries we w* *ill describe - when we have looked at the code again. 9.3 Variable Length Fields The format of the .str file used by the memServer for variable length fields (t* *ypes dbData and dbString) are described in section 5. 48____________________________________________________________Miscellaneous_ 10 Miscellaneous 10.1 ToDo fflImplementing multi-user version. This is very high priority - and will probably involve some use of shared memory and/or a central locking process. This has been implemented for a single machine, multi client model. fflImplementing duplicate records Not likely! We've come to the conclusion that the inclusion of a unique field in an index gives exactly what ducplicate indexes gives you except * *the semantics are far better. fflA query language fflBtree reconstruction facility fflRecord dump facility This now exists in the db2txt program. The recon- struction facility will be the reverse of this once written. It is next o* *n the agenda. fflMonitoring facility to count database operations fflDatabase browser This has been made easier by the recent addition of diaGRel. fflInteractive relation creation fflEliminate bTree indexing of completely unique fields fflBetter fan-out specification in database The bTree buckets are fixed size currently - and will take as many keys as will fit. The number of keys should be set and the size of the bTree buck* *et varied. fflefficient bTree deletion bTree nodes aren't merged when they drop below half full. They should be for efficiency reasons. fflDatabase update tool When adding fields to a database, it would be nice to be able to upgrade the existing database automatically. This has to be done by hand currentl* *y. DiamondBase_____________________________________________________________49_ 10.2 Profiling data We include some sample profiling data for those speed freaks out there who want to improve performance. We found some very interesting things after our initial profiling experiments - like the fact that our memory checking library had 90% of the CPU time. The bcopy function was also overused quite a bit and this led to far too many function calls where much better alternatives were available. Use of bcopy has been reduced to a minimum. We have attempted to speed up all the very frequently used functions and eliminate unnecessary calls to them. Kevin is to provide this bit. [Well, I will one day when I can collect all t* *he data and get it into some format that might be vaguely useful for something or someone and possibly formatted in a way that could fit here, in addition to making this sentence shorter in some way or another]. References [1]C.J. Date, An Introduction to Database Systems, Volume 1, Fifth Edition, Addison-Wesley, 1992, pp 245-398.