home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
OS/2 Shareware BBS: 22 gnu
/
22-gnu.zip
/
db02_bin.zip
/
manual.txt
< prev
next >
Wrap
Text File
|
1994-02-22
|
98KB
|
3,855 lines
DiamondBase Documentation
Version 0.2
Darren Platt
Andrew Davison
Kevin Lentin
darrenp@dibbler.cs.monash.edu.au
davison@bruce.cs.monash.edu.au
kevinl@bruce.cs.monash.edu.au
January 18, 1994
2
DiamondBase_______________________________________________________________1_
1 What Is DiamondBase ?
1.1 Description
DiamondBase is an implementation of a programmer's database. It supports the
basic relational model via a schema compiler and the C++ language. It has been
designed to make usage of the resulting relations simply, via object methods. It
was written by three students at Monash University Melbourne, Australia in their
spare time, because they are crazy, and have always harboured secret ambitions
to write a database. Kevin's previously frustrated attempts met with Darren's
misguided notions that a database would make his PhD easier to write, and Andy
wanted a database for a bibliographic retrieval system called Bibel. And so the
three muskateers set off in search of adventure.
1.2 Platforms
DiamondBase has been compiled for Linux, Ultrix, SunOS 4.1.1 using GCC, for
Irix using GCC and CFRONT and OS/2 with the Borland Compiler. It has also
been successfully compiled and used on an Amiga using SASC, NeXT using GCC
and on an RS6000. It currently requires gnu-make for its makefile, but a generic
Makefile is provided.
1.3 Distribution
Read the Copyright message in the distribution.
1.4 Limitations
1.4.1 Introduction
This section is depressingly large at the moment. It outlines the areas where we
believe Diamondbase is currently deficient. This is by no means an exhaustive l*
*ist,
and we are open to suggestions for improvements. Having listed the deficiencies,
it should also be noted that there are no fixed plans to eliminate these defici*
*encies.
Database extension is currently driven by the whims and requirements of the
authors. We are quite open to accepting extensions written by other people - so
submit that SQL parser as soon as you have written it.
1.4.2 Limited Multi-User Support
This is a pretty major restriction at the moment. You should ensure that only
one copy of your application is in use at any one time.
2___________________________________________________What_Is_DiamondBase_?__
Multi-user support is now available under Linux, Ultrix and Sunos. In its
current incarnation it is setup to run under one user-id. It can very easily be*
* run
by multiple people. It must run on one machine.
We believe the Irix port of this part of the product should be merely a com*
*pi-
lation issues. The Amiga and OS/2 versions may require more work. The OS/2
version should be available very soon.
1.4.3 No multi-machine support
The multi-user support uses shared memory segments and IPC communication. It
can therefore not be used over multiple machines. It can be moved from machine
to machine, but at any one time, all clients must be on the same machine as the
server.
We have used the UNIX ndbm package to implement a multi-user database
by accepting connections on a socket and multi-threading the executable. There
is only one server process accessing the database but many clients can use it
concurrently. If you are interested in such code please let us know.
1.4.4 No SQL
I'm philosophical on this point. SQL isn't god's gift to query languages (It's *
*more
like satan's if the truth is to be known). I would however like to add a query
language to DiamondBase at some stage, and SQL is certainly a standard choice.
The reliance on compiled code to do the comparison makes an interactive SQL
setup more difficult since it would have to second-guess the compiler's layout *
*for
data structures - or dynamically link the comparison code.
Embedded SQL would be easier to implement. It would require a compiler to
preprocess the SQL (preferably not by extracting it from the C++ - but having
queries stored in separate text files) - and then generate C++ code to access t*
*he
objects in the right order.
1.4.5 No duplicate indexes
This is a nuisance as much as anything. All records must have a unique value for
each index. If you wish to effectively have duplicates, then you must supply an
extra field which makes them unique.
So, for example, if you wish to index employees by age, then you would also
include employee number in the index. Finding all the employees who are 12
years old then involves finding the first record with age=12 and id>=0. Continue
fetching the next record until age is not equal to 12.
This approach is a bit messy and labour intensive, so we would like to perm*
*it
duplicates - if we can work out what it will do to the poor Btrees.
The package does, however, support a unique type. This is a long that is
assigned a value by DiamondBase and is guaranteed to be unique within the
DiamondBase_______________________________________________________________3_
relation. Included in an index, it allows effectively for duplicate indexes wi*
*th
much better semantics.
1.4.6 Recovery and Auditing
The Btree indexes can be built from the data storage file if they become corrup*
*ted
- but no facility currently exists to do this. A record dump facility to dump t*
*he
records in the record store to a text file is currently available in the form o*
*f the
db2txt program. There is no auditing to keep track of all transactions that were
performed. This is not a high priority for us currently. If this is a high prio*
*rity
for you then please let us know how you would like it done.
1.4.7 Static limits
Various components of the database have static limits. These are listed in the
table below along with a description and the file in which they are defined bel*
*ow.
__________________________________________________________________
_ Name/Description _Value _Where _
___________________________________________________________________
_ MAX_QUERY _10 _defs.h _
_ ________________________ _
_ The maximum number of queries that a bTree or dbObj will allow. _
__________________________________________________________________ _
_ MAX_REG _20 _dbase.h _
_ ________________________ _
_ The maximum number of registrations that the dbObj will allow. _
__________________________________________________________________ _
_ MAX_DB_INFO _20 _dbase.h _
_ ________________________ _
_ The maximum number of relations that the dbObj will keep open. _
___________________________________________________________________
_ MAX_RELATIONS _20 _nserver.h _
_ ________________________ _
_ The maximum number of relations that the nameserver can handle. _
___________________________________________________________________
_ MAX_FIELDS_IN_INDEX _10 _idxinfo.h _
_ ________________________ _
_ The maximum number of fields that an index is permitted to have. _
__________________________________________________________________ _
_ MAX_NAME_LENGTH _300 _idxinfo.h _
_ ________________________ _
_ The maximum length that an index name can be. _
___________________________________________________________________
_ MAX_TRANS_AREA _4K _dcdefs.h _
_ ________________________ _
_ The maximum relation size for the multi version _
__________________________________________________________________ _
1.4.8 Fixed length records
Records are currently of a fixed length determined at the creation of the relat*
*ion.
Support for variable length fields is available though the dbData and dbString
types. The first is a generic binary object. The second is a derivative which
behaves like a null terminated string. Details of these features are in Section*
* 5.
4________________________________________________________Interface_Functions_
2 Interface Functions
2.1 Introduction
This section describes the basic application functions available to the program-
mer. They are supplied as methods to the relation class generated by dsc.
2.1.1 General functions
fflbool ok(void)
- Description
Checks if last database operation was successful.
- Return codes
true - successful operation
false - unsuccessful operation
- Example
fred yourRel;
yourRel.add();
if (!yourRel.ok()) -
cerr << "An error occurred." << endl;
"
fflvoid perror(char *description)
- Description
Prints a textual description of the last error code to cout -
with some textual prefix which you supply.
- Example
fred yourRel;
yourRel.id = 1234; // Which doesn't exist
yourRel.get();
if (!yourRel.ok()) -
yourRel.perror("Getting a record:");
"
fflvoid stats(void)
DiamondBase_______________________________________________________________5_
- Description
Outputs the current database cache hit statistics to cout for
this relation.
- Example
yourReltype yourRel;
yourRel.stats();
Output..
Record Server cache: 1669 attempts, 146 hits = 8.74775%
1523 writes and 1513 disposals
Index 0 cache: 0 attempts, 0 hits = 0%
0 writes and 0 disposals
Index 1 cache: 593 attempts, 454 hits = 76.5599%
139 writes and 129 disposals
Index 2 cache: 3 attempts, 0 hits = 0%
3 writes and 0 disposals
2.1.2 Database Manipulation functions
ffldbError add(void)
- Description
Insert the current values for this relation into the database.
Any fields which are of type unique will be assigned a unique
field as the record is inserted. The corresponding fields of
the structure will reflect these assigned values upon return.
The data in the object is not modified by the call (except the
unique field).
- Return codes
db_ok - successful addition
db_dup - addition of this record would create a duplicate record
on one or more indexes.
db_nopen - database isn't currently open.
- Example
employee newEmp;
strcpy(newEmp.name,"Joe Bloggs");
newExp.age = 12;
6________________________________________________________Interface_Functions_
switch(newExp.add()) -
case db_ok: break;
case db_dup:
cerr << "This employee already exists" << endl;
break;
case db_nopen:
cerr << "Open the database first dummy" << endl;
"
ffldbError begin(int indexNumber = 0)
- Description
A call to begin initiates a query - signalling an intention to
retrieve one or more records from the database using a partic-
ular index. Within a query, an implicit pointer is maintained
to a position within an ordered index list.
Whilst direct fetches with get may be executed without a
begin/end pair - begin must be used with all relational retrieval
operations. If a previous begin is outstanding when this is
called, the previous query is terminated. You should therefore
not use get inside a begin/end pair of the same instance of a
relation as the get will terminate the query. If you use 2
different instances of the relation, then it will work correctly
as each has its own query.
The following functions require a query and must therefore
be executed within a begin/end pair.
extract
extractNext
extractPrev
find
first
last
next
peekNext
peekPrev
prev
seek
seekFirst
seekLast
write
DiamondBase_______________________________________________________________7_
- Return codes
db_ok - the query started correctly
db_range - the supplied index is out of range
- Example
// Retrieve all employees' names.
employee tempEmp;
tempEmp.begin(dbIdx_name); // Initiate query
for(tempEmp.first();tempEmp.ok();tempEmp.next()) -
cout << tempEmp.name << endl;
"
// The same thing, using a while instead
tempEmp.seekFirst();
while(tempEmp.next() != db_eof) -
cout << tempEmp.name << endl;
"
tempEmp.end();
ffldbError del(void)
- Description
Delete the record matching the current values for the index
from the database. The data in the object is not modified by
the call.
- Return codes
db_ok - successful deletion
db_nfound - Couldn't find record to delete it
- Example
employee exEmp;
exEmp.id = 1234; // Employee id to delete.
exEmp.get(); // Fetch this employee.
if (exEmp.ok()) -
if (exEmp.del()==db_nfound) -
8________________________________________________________Interface_Functions_
cout << "Couldn't find employee" << endl;
"
"
ffldbError end(void)
- Description
End the previously initiated query. This is mainly for effi-
ciency reasons, and to terminate any outstanding locks, since
the next begin will automatically terminate the previous begin
anyway.
- Return codes
db_ok - no problems.
db_noquery - there was no matching begin.
db_range - query value was not in the valid range
- Example - see begin
ffldbError extract(void)
- Description
This call does a find on the supplied object and then locks the
object if it is found. The record itself is locked, not the key
being used, so accesses to the record through any key will fail.
After an extract, any access of the record will return db_locked
. If this access is an extract, an extract will not occur and the
operation will be equivalent to a find. If the access is another
read-type operation - then the data will be retrieved. If this
access is a write, then it will fail.
If the requested record does not exist, then the data in the
object is not modified. In that situation, extract degenerates
to a seek.
- Example
- Return codes
db_nfound
db_ok
db_locked
ffldbError find(void)
- Description
DiamondBase_______________________________________________________________9_
Search for the index entry in the database. If the entry exists,
then the query pointer is positioned at that point and the
record is retrieved. Otherwise, the pointer is moved to the
following record, and that record is retrieved.
Note that since the retrieval of that record involves a call to
next, subsequent calls to next will not return this same record.
If you wish to position the relation before a record without
advancing the record pointer, use the seek family of functions
instead.
- Return codes
db_ok - a matching record was found retrieved
db_nfound - no matching record was found and the data for the
next record was placed in the class.
db_locked - the record was found and was locked. The data is still
returned however.
db_range - query value was not in the valid range
db_noquery - query value was not valid
- Example
employee exEmp;
exEmp.id = 1234; // Employee id to find.
exEmp.find(); // Fetch this employee.
ffldbError first(void)
- Description
The first function may only be executed from within a query
(see begin). It sets the query pointer to the first record and
returns the record at that position if one exists.
Note, that since first retrieves the first record, a subsequent
call to next will not return that record. Instead, it will return
the second record. See seekFirst.
- Return codes
db_nfound - there are no records in the database
db_range - query value was not in the valid range
db_noquery - query value was not valid
- Example - see begin
ffldbError get(int idxNumber=0)
10_______________________________________________________Interface_Functions_
- Description
Retrieve a record from the database using explicit values for
an index. This is the only retrieval function which does not
need to be executed within a begin/end pair. The function
issues an implicit begin/end pair around this atomic opera-
tion.
- Return codes
db_ok - Record matching index value was found
db_nfound - Record was not found
db_range - the index specified was illegal
- Example
//
// relation host_to_ip -
// char hostname[100];
// char ip[16];
// index byIp on ip;
// index byName on hostname;
// "
host_to_ip myLookup;
strcpy(myLookup.hostname,"dibbler.cs.monash.edu.au");
if (myLoopkup.get(dbIdx_byName)==db_ok) -
cout << "IP=" << myLoopkup.ip << endl;
" else -
cout << "Not found" << endl;
"
ffldbError last(void)
- Description
The last function may only be executed from within a query
(see begin). It sets the query pointer to the last record and
returns the record at that position if one exists.
- Return codes
db_ok - last record found and retrieved
db_nfound - there were no records
db_noquery - Last must be within a begin/end pair
- Example - see prev
DiamondBase_____________________________________________________________11_
ffldbError next (void)
- Description
This function may only be executed from within a query (see
begin). It sets the query pointer to the next relation for the
current query index and retrieves it if it exists.
- Return codes
db_ok - Next record is retrieved
db_eof - No more following records
db_noquery - Next was not executed within a begin/end pair
db_locked - data retrieved but record locked by someone else
- Example -see begin
ffldbError peekNext(void)
- Description
Return next record if it exists, but do not advance the btree
query pointer. Subsequent next, prev, peekNext and peekPrev
calls will return values as if this call never occurred.
- Return codes
db_ok - Next record is read
db_eof - No more following records
db_noquery - Next was not executed within a begin/end pair
db_range - query value was not in the valid range
db_locked - data is retrieved but record is locked
- Example
// Given the records,
// JONES,PERCY,PLATT and SMITH
employee tempEmp;
strcpy(tempEmp.name,"PLATT");
tempEmp.begin(dbIdx_name); // Initiate query
tempEmp.find(); // Fetches PLATT
tempEmp.peekNext();
cout << tempEmp.name << endl; // Prints SMITH
12_______________________________________________________Interface_Functions_
tempEmp.next();
cout << tempEmp.name << endl; // Prints SMITH again
tempEmp.end();
ffldbError peekPrev(void)
- Description
Return previous record if it exists, but do not move the btree
query pointer. Subsequent next, prev, peekNext and peekPrev
calls will return values as if this call never occurred.
- Return codes
db_ok - Previous record is read
db_eof - No more preceding records
db_noquery - prev was not executed within a begin/end pair
db_range - query value was not in the valid range
db_locked - data is retrieved but record is locked
- Example
// Given the records,
// JONES,PERCY,PLATT and SMITH
employee tempEmp;
strcpy(tempEmp.name,"PLATT");
tempEmp.begin(dbIdx_name); // Initiate query
tempEmp.find(); // Fetches PLATT
// NB query pointer is now after PLATT
tempEmp.peekPrev();
cout << tempEmp.name << endl; // Prints PLATT !
tempEmp.next();
cout << tempEmp.name << endl; // Prints SMITH
tempEmp.end();
ffldbError prev(void)
- Description
DiamondBase_____________________________________________________________13_
This function may only be executed from within a query (see
begin). It sets the query pointer to the previous relation for
the current query index and retrieves it if it exists.
- Return codes
db_ok - Previous record is retrieved
db_eof - No more previous records
db_noquery - Prev was not executed within a begin/end pair
db_locked - data retrieved but record locked by someone else
- Example
// Retrieve all employees' names - but in reverse
// order.
employee tempEmp;
tempEmp.begin(dbIdx_name); // Initiate query
for(tempEmp.last();tempEmp.ok();tempEmp.prev()) -
cout << tempEmp.name << endl;
"
tempEmp.end();
ffldbError put(int idxNumber=0)
- Description
This function writes a record into the database regardless of
its presence. If the record does not exist, this call does an add,
otherwise it does a write. This call, like get does not require
a query and, as such, contains an implicit begin/end pair. It
can be used to write a record into the relation with no regard
to the current contents.
- Return codes
db_ok - Record was successfully written
db_dup - The record's keys partially clash with existing records
db_range - the index specified was illegal
- Example
// relation host_to_ip -
// char hostname[100];
// char ip[16];
14_______________________________________________________Interface_Functions_
// index byIp on ip;
// index byName on hostname;
// "
host_to_ip myLookup;
strcpy(myLookup.hostname,"dibbler.cs.monash.edu.au");
strcpy(myLookup.ip,"130.194.62.33");
if (myLoopkup.put(dbIdx_byName)==db_ok) -
cout << "Added successfullly" << endl;
" else -
cout << "Something went wrong" << endl;
"
ffldbError seek(void)
- Description
Positions the query pointer using the values for the current
query index. The next record returned by a call to next or
peekNext will be the record requested if it exists in the rela-
tion, or the one that would follow it if it did exist.
- Return codes
db_ok - The specific record was found.
db_eof - No more preceding records
db_noquery - prev was not executed within a begin/end pair
db_range - query value was not in the valid range
db_nfound - the specific record was not found.
- Example
fflseekFirst
- Description
Position the query pointer just before the first record for the
current query index, so that the next record retrieved will
be the first one. Like first, this call does not depend on the
contents of the relation - it only moves to the beginning of
the index.
- Return codes
- db_ok - The record was found.
- db_eof - No records
- db_noquery - seekFirst was not executed within a begin/end pair
DiamondBase_____________________________________________________________15_
- db_range - query value was not in the valid range
- Example
ffldbError seekLast(void)
- Description
Position the query pointer just after the last record for the
current query index, so that calling prev will retrieve the last
record for that index.
- Return codes
- db_ok - The record was found.
- db_eof - No records
- db_noquery - seekLast was not executed within a begin/end pair
- db_range - query value was not in the valid range
- Example
ffldbError write(void)
- Description
If all the keys are already present in the database and they all
refer to the same record, then that record will be overwritten
with the values in this class. If all the keys are not present
then the call fails. The standard way to update a record is
to perform an extract followed by a write. If you wish to
change one of the fields on which the record is indexed, then
you must create a new record and copy the unchanged details
from the old record to the new - followed by a deletion of the
old record. This decision was made to considerably simplify
the update mechanism within the database.
If the write is being performed after an extract, then the keys
are checked to ensure that they have not changed since the
extract. If they have, then the write will fail. This last check
is currently unimplemented.
- Return codes
db_ok - record was overwritten successfully
db_dup - neither 0 nor all the indices matched
db_range - query value was not in the valid range
db_noquery - query value was not valid
db_nfound - record is not already in the index
- Example
16_______________________________________________________Interface_Functions_
2.1.3 Data Members
status
Holds the return code from the last database operation.
Error codes: db_ok ....
DiamondBase_____________________________________________________________17_
3 Internals
3.1 Introduction
As far as the big picture goes, you write a program which needs database func-
tions. You have used DSC to produce both the physical storage files, and some
compiler generated code which you link into your application. You also link in
the diamond library which contains functions needed to perform the database
operations you execute as methods to DSC generated objects.
This section of the manual describes how the internals of DiamondBase are
implemented. We would like you to understand what is happening inside the
database - as this will aid you in debugging, reporting bugs within the database
code and suggesting improvements and changes.
3.2 Typical application code
#include "myrel.h"
diamondBase theDiamondBase("config.db");
main()
-
myRel theRel; // Attach stage (a)
theRel.begin();
theRel.first();
// Use theRel here.
theRel.extract(); // Decide to update it.
// Change some non-indexed member fields
theRel.write(); // Put it back
theRel.end();
"
Note the appalling lack of error checking going on - utterly disgusting. The
definition for the 'myRel' class will inherit both a transfer area and a diaRel*
* class.
The transfer area acts as a sub-struct with only data members which can be used
for memory transfers as records are stored and retrieved. The diaRel base class
has the necessary methods to implement the calls (eg first, next, write).
Note that in the above example, the myRel instance was in scope for all of
main - it need not have been. You can declare relations wherever you wish to
18_________________________________________________________________Internals_
use them. Global relation instances are even permitted, provided you can ensure
that the theDiamondBase instance you declare is constructed first.
Which brings me to the final point for this section - you must have an inst*
*ance
of diamondBase called `theDiamondBase' at the global scope in your program.
We could have defined one ourselves in the library, but this way you get to name
the database configuration file.
3.2.1 diaRel
There is one instance of diaRel in each user-defined relation class. It imple-
ments the functions which are used to manipulate the database. diarel.h and
diarel.cc are fairly self explanatory. The constructor for diaRel attaches to t*
*he
global theDiamondBase object and obtains a reference Id which is used in all
further correspondance up until the point where the class which inherits diaRel
has been destroyed.
diaRel also inherits an object class. This has a number of pure virtual met*
*hods
which the database ultimately uses to perform functions like fetching the data,
storing the data, extracting the key component of the data - comparing keys.
These functions are performed by DSC generated code and so a pointer to the
base object class is passed to theDiamondBase during attach in order to permit
this.
3.2.2 diaGRel
The diaGRel (or Generic Relation) class can be used in place of a DSC generated
class to manipulate any relation you desire. This can be used to write generic
relation browsers or such like. The use of the diaGRel class is described in de*
*tail
in section 4. diaGRel is used to implement the db2txt relation dumping utility,
and the server in the multi-user version.
3.2.3 diamondBase
diamondBase has rather a large job. It overseas all the classes which actually
manipulate the database. On construction it is supplied with a string which is a
path to the name server configuration file. This file is typically called "conf*
*ig.db"
- although the default parameter is "ns.dbc". This parameter is passed to the
base TNameServer class.
As mentioned in the diaRel section, diamondBase handles incoming diaRel
connection requests - assigning a reference id to each new connection. A list of
such class clients is maintained.
A list of active and recently deregistered relations is kept. Old relations*
* are
kept to avoid unnecessary overhead when a relation class is repeatedly created
then destroyed. If multiple active diaRel classes are accessing the same relati*
*on,
then only one relation is maintained for all of them.
DiamondBase_____________________________________________________________19_
Each diaRel uses only one relation - and provides its name upon registration.
Failure to locate that database name using TNameServer is an error.
Each relation has an associated pointer to a dbObj through which all non
register/deregister operations are handled.
3.2.4 TNameServer
This class uses the parameter supplied at construction time to open a file con-
taining entries in the form:
relationName=directory-path
This is simply used to provide some flexibility in the mappings from a relation
name onto the directory where it is actually stored.
As an alternative, it is possible to pass the empty string ("") to diamondBa*
*se
on construction and then the TNameServer will assume that all relations are in
the current directory. This is useful when using diaGRel.
3.2.5 dbObj
One dbObj instance handles one relation. For each index on that relation, the
dbObj maintains one bTree instance through which operations using that index
are performed. It inherits an recServer base class which it uses to fetch the
actual data associated with a particular record. Actual operations on indices a*
*re
perfromed by calling the appropriate method in a bTree.
The dbObj maintains a set of active queries which each refer to a particular
bTree query. Record locking occurs at the dbObj level to prevent two updates
occurring on the one record concurrently.
The dbObj also collects information about the relation when it is first inst*
*an-
tiated. This allows the diaGRel to find out all the information it needs about a
relation.
3.2.6 bTree
The bTree is probably the most complicated object in the whole hierarchy (histo*
*r-
ically it is was the first thing to be written). It manages the indexing of rec*
*ords.
It manages its own queries, taking care to ensure that multiple queries reflect*
* the
changes made to each other, whilst optimizing as much as possible for speed.
It inherits a recServer to store the actual buckets. The actual implementati*
*on
is essentially a B+ tree with pointers to allow sequential retrieval. At the mo*
*ment,
buckets which drop to less than half full aren't merged with sibling buckets.
20_________________________________________________________________Internals_
Operations like fetching key data from the user class, setting key data, co*
*m-
paring keys etc, are performed via calls to member functions in the object class
that is passed in.
DiamondBase_____________________________________________________________21_
4 Generic Relations
4.1 Introduction
The fact that DiamondBase is based upon the DSC relation compiler meant
initially that only those relations that had code compiled into an executable
could be accessed by that executable.
This was seen as being too restrictive because it meant that no programs
could be written that acted upon an arbitrary relation. Since it was intended to
eventually write such programs as relation browsers and index rebuilders, it was
vital that a mechanism for accessing unknown relations be written.
4.2 diaGRel
The diaGRel class accomplishes this. It is meant to be used in any situation wh*
*ere
a DSC generated class would otherwise be used. It implements all functions that
the generated code does except that it is less efficient.
This class is instantiated as a DSC generated class, except that its constru*
*ctor
takes a string as a parameter which specifies the relation to use. This should *
*be
the name of the files which make up the relation without their file extension.
4.3 Use
Once instantiated, a diaGRel is used exactly as any DSC generated class is.
4.4 Implementation
Initially, the diaGRel was written to open the relation file and read in the re*
*cord
and key structure stored there. It would then do some pretty horrendous traver-
sals of this data to build up arrays of information about the relation. These
arrays contain data in a form that makes implementing the methods required to
imitate a DSC generated class quick and efficient. This process is complicated *
*by
the fact that relations and keys are reordered by DSC to put them in an order
that prevents alignment problems.
Key access in a diaGRel is the main ineffciency of the class. This is because
much pointer arithmetic has to be done at runtime to extract relevant data. In
a DSC class this is done at compile time by the compiler by way of structure
references.
Access to the relation data can be via one of two mechanisms. Either the
diaGRel allocates a block of memory or a block of memory is given to it as
construction time. The latter has the benefit that you can inherit a stuct that*
* is
in the relation and pass a cast to this to diaGRel thus allowing you to referen*
*ce
22_________________________________________________________Generic_Relations_
relation members by name. Without this, you only have access to the relation as
a block of memory.
struct BookRel -
char name[100];
char author[100];
long barcode;
"
class Book : public BookRel, public diaGRel -
public:
Book(void);
"
Book::Book(void) : diaGRel("book",(BookRel *)this);
-
"
diamondBase theDiamondBase("config.db");
main()
-
Book ARM;
strncpy(ARM.name,"Annotated C++ reference",sizeof ARM.name);
strncpy(ARM.author,"B. Stroustroup",sizeof ARM.author);
ARM.barcode = 11345654;
ARM.put();
"
4.5 Performance
The above implementation worked correctly but inefficiently. The reason for this
was the necessity to open, read and disect the relation file on every instantia*
*tion.
This meant that a process using a diaGRel instead of a DSC generated class took
more than 5 times as much system time to perform the same job. This was not
satisfactory.
The solution to the above was to allow the dbObj to collect the data. This *
*was
already being partially done there and the change involved setting up a struct *
*to
hold all the data and moving a lot of code from the one class to the other. Now
DiamondBase_____________________________________________________________23_
the dbObj can return a pointer to this struct to the diaGRel via the diamondBas*
*e,
through the attach method to the diaRel underneath the diaGRel (phew!).
The above change meant that unless more than about 10 different relations
are used in a program, the data discussed above is assembled only once. The
result is that the diaGRel takes about twice as much user time as the DSC
generated classes (about expected) and about 35-40% more system time (also
not a disappointing number).
This raises interesting performance questions for database engine writers. U*
*n-
less you either employ a system whereby you write self-modifying code, or you
link in generated code (as we do) - you are are forced to accept this overhead *
*due
to interpreting your relation layout. It basically involves extra levels of ind*
*irec-
tion during database access. One could envisage a system whereby the database
engine generates custom machine code in designated parts of its executable for
optimally handling each relation, and then loads it in when that relation is ne*
*eded
- but the mess associated with that system is not warranted at this stage.
24____________________________________________________Variable_Length_Fields_
5 Variable Length Fields
5.1 Introduction
A noted absence in early versions of DiamondBase was variable length records in
some form or another.
Given the complexities of implementing a variable length record server, we
opted to instead provide variable length fields. This solution satisfies almost*
* all
sensible known uses of length variation in a relation.
5.2 dbData
To this end, two variable length field types are now available: dbData and
dbString. The dbString type is derived from dbData and so I will discuss dbData
first in detail. dbData is a type that can be used to start arbitrary (possibly
binary) data of variable size.
The following access methods are available.
fflconstructors
- dbData: Construct from another dbData.
- char*: Copy the string and make the size of the dbData equal to 1
more than the string length.
- void*,long: Make the dbData the specified length and copy that much
data in.
- void: Just creates an empty dbData.
fflunsigned int len(void): Returns the current length of the dbData object.
fflunsigned int setSize(unsigned int): Sets the size of the dbData object a*
*nd
returns the size set. The result of this method differs from its paramet*
*er
only in dbString cases, see below.
fflunsigned int getSize(void): Returns the current size of the object. No*
*te
that for dbString, size and length are different.
fflchar* cat(char *): Adds a string onto the end of a dbData object. Returns
a char* pointer to the data in hte dbData.
fflchar* cat(dbData&): Similar but adds the contents of the dbData passed
in.
fflchar* cpy(char*):
DiamondBase_____________________________________________________________25_
fflchar* cpy(dbData&): These two replace the contents of the dbData with
the relavent data.
fflvoid fill(char, unsigned int=0): Fills the dbData with the specified char*
*ac-
ter. If the second argument is passed then that many characters are filled
in, otherwise the entire dbData is filled.
fflvoid clr(void):
fflvoid dispose(void): These both free any memory associated with the dbData
and set it's content and length to 0.
ffloperator char*:
ffloperator void*: These return appropriate casts of the dbData object.
ffloperator +(char*):
ffloperator +(dbData&): These add the contents of their second operand onto
the end of their first.
ffloperator <<:
ffloperator >>: Perform the appropriate stream operations.
ffloperator []: Returns a reference to the appropriate char in the object or*
* a
reference to a dummy char if the paramter is out of range.
ffloperator =(char*):
ffloperator =(dbData&): These behave exactly as cpy.
ffloperator +=(char*):
ffloperator +=(dbData&): These behave exactly as cat.
5.3 dbString
The dbString class was orginally written (and rewritten a few times) as the only
variable length type. When the more general dbData class was designed it became
clear that dbString should be a derived class from dbData. This is now how it
works.
dbString behaves almost exactly as dbData does except that it is a null ter-
minated string. This means that some of the resizing functions behave slightly
differently. In particular, the size of a dbString and the length of a dbString*
* are
different things. The size is the amount of memory allocated to the internal ch*
*ar
array and the length is the number of characters before the first null characte*
*r.
26____________________________________________________Variable_Length_Fields_
The difference between size and length can be used to ensure that there is
enough space in a dbString to perform an operation. For instance, you may use
setSize to increase the size of a dbString before passing a char* cast of it to*
* a
function which manipulates it (eg the library sprintf function). In this way, y*
*ou
know there is enough space in the string to extend its length.
The following are the differences in dbString.
fflThe following functions behave differently either by using a differnet d*
*efi-
nition of length or by storing a 0 to truncate the string during size cha*
*nges
or adding (or otherwise taking into account) the extra 0 needed on the end
of a string.
- unsigned int len(void)
- void setSize(unsigned int)
- char* cat()
- char* cpy()
- void fill(char c, unsigned int = 0)
- constructors: by passing a flag to dbData to indicate it is a string
5.4 memServer
The memServer is the class that allows the variable length fields to be used wi*
*th
diamondBase. It does this by creating a new file with a .str extension to hold
the dbString and dbData records.
The memServer file is laid out like a block of memory being manipulated by
an allocator such as malloc. It manipulates used and empty blocks and joins
empty blocks should they lay adjacent to each other. Blocks are alocated on a
first-fit basis.
In order to able to access these fields normally in a relation, they must be
members of the relation. In order to make it practical to store the relation in
a recServer, only longs are stored for each dbString or dbData. These longs are
offsets into the .str file maintained by the memServer. It is therefor necessar*
*y to
have those longs stored in the relation structure so it can be written out effi*
*ciently
by the recServer.
The solution to the above dilema was to declare another struct to hold the
dbString and dbData members with the same names as given in the relation
file. The long offsets are stored in the main relation struct with slightly man*
*gled
names. Both of these structs are inherited by the relation.
In this way, users may reference their dbString and dbData references as th*
*ey
are named even though only longs are stored in the relation struct.
The dbObject makes sure that whenever there are dbString or dbData fields
in a relation, appropriate calls are made to memServer to keep them up to date.
DiamondBase_____________________________________________________________27_
Every time a record is read or written, the strings need to be similarly update*
*d.
This gives a performance penalty compared to using char arrays to store strings
but this penalty has been optimised heavily to minimise this penalty.
The memServer and its integration into the dbObject provide seamless use of
dbData and dbString fields in relations.
28_________________________________________________Multiuser_DiamondBase___
6 Multiuser DiamondBase
6.0.1 Disclaimer
Warning: The multi user component of DiamondBase has only been completed
very recently, and is being released as an ALPHA version. There are many defi-
ciencies documented below (and goodness knows how many undocumented ones).
If you intend to use it, then read this carefully, and please give us feedback *
*on
what you think, and any improvements you think are necessary.
6.1 Introduction
The transition from single user to the multi-user version of DiamondBase has
been quite an adventure. We started by defining what we meant by "multi-user".
To an extent, the existing version already supports a level of concurrency in t*
*hat
you can have multiple objects in your program which are all accessing the one
relation. There is proper locking to ensure that they don't interfere with one
another.
At a higher level however, you want multiple processes which may not even
be the same program, accessing the database concurrently without corrupting
the underlying files. For this version, we have chosen to implement a system
which allows multiple processes on the one machine. The possibility exists for a
distributed database across a TCP network but that shall wait.
6.2 What didn't work
The first attempt at a concurrent version of DiamondBase began at the bottom.
If the record management at the recServer level was given proper locking, then
all that sat on top of would behave transparently. This turned out to be about *
*as
incorrect as is imaginable. For a start, cache coherence becomes a problem. The*
*re
are also higher level structures like B-Trees, for which, simply locking a smal*
*l part
is totally inappropriate. As the changes propagated up the code hierarchy and
the changes to the bottom layer multiplied, this was abandoned. Andy wisely
smiled, having predicted this from the start.
Having explored its "rear end" if you like, we next commenced with oral
surgery. Putting a layer near the top, between the schema generated code and
the database engine would allow the two to exist in separate applications quite
happily. This worked much more nicely, with the engine left unchanged, and
some surgery at the level of the diamondBase and diaRel objects. There was a
problem however. Having all the comparison code compiled into the application
makes for a very fast single user version (and I would still recommend it unless
you really need concurrent behavior). However, it meant that the server still
DiamondBase_____________________________________________________________29_
needed to convey the object callbacks to the application with the appropriate
data.
After quite a bit of coding, and the odd kernel IPC fix or two for linux,
we had a running version which was only 11 times slower than the uni version.
Disgusted, I decided that for this, and other reasons, we would have to interpr*
*et
the relation layout and do the comparisons within the server, only talking to t*
*he
application when it issued a request, and to supply the reply. The diaGRel class
was born to permit generalized access so the server could get to any relation, *
*and
the communications protocol emasculated to handle the smaller command set.
6.3 The architecture
This version uses shared memory segments between the server and the application
processes to transfer records, and a message queue shared between all processes*
* to
communicate requests and intentions. Incoming messages on a very low message
number signify attach requests from applications.
The connection protocol is as follows:
fflClient sends an attach request as a message with id 2, and its pid.
fflServer creates /tmp/diamond.pid and a shared memory segment to go with
it. It then sends a message on BASE_RESP + pid
fflClient attaches to the new shared memory area.
Further transactions are performed as follows:
fflCopy the relations data into the shared memory segment
fflSend message with transaction details on BASE_REQ. Initial versions had
pid added to this but this caused clients with lower pid's to be given pre*
*f-
erential treatment. Now the pid is embodied in the message.
fflServer reads message and gets the database engine to perform the trans-
action after copying the data from the shared area into a diaGRel object
which is carrying out the database requests on behalf of the client.
fflServer copies resultant relation into transfer area and sends reply messa*
*ge
on BASE_RESP + pid
Transactions consist of database transaction ids along with associated index
numbers, new object attach requests, object detach notification messages and ap-
plication detach messages. The reference Id numbers handed out by the database
engine are mapped onto "fake" diaGRel objects which do all the work in the en-
gine on behalf of the client's diaRel objects. Two lists are maintained to tra*
*ck
the attached client applications and their attached objects. Having the diaGRel
functionality makes the whole process quite simple.
30_________________________________________________Multiuser_DiamondBase___
6.4 Performance
Was not terribly disappointing. It was never going to be faster than the uni
version, but you have to keep in mind some potential gains from having a separa*
*te
server:
fflKilling application programs won't jeopardize database integrity
fflRetrieved data is in a single pooled cache increasing efficiency
fflThe possibility exists to prevent direct access to the database files.
___________________________________
_ Test _User _System _ Total _
___________________________________ _
_ Uni _27.9 _ 8.0 _ 35.9 _
____________________________________
_ Multi (1) _ _ _ _
_ _ _ _ _
_ server _23.19 _ 11.8 _50.19 _
_ _ _ _ _
_ client _10.0 _5.2 _ _
____________________________________
_ Multi (2) _ _ _ _
_ _ _ _ _
_ Server _49.0 _ 22.3 _104.23 _
_ _ _ _ _
_ client A _11.0 _22.3 _ _
_ _ _ _ _
_ client B1_0.83 _ 5.5 _ _
____________________________________
Table 1: Performance figures for multi and uni versions
From the above figures, the multi version is approximately 40% slower than
the uni version - which can largely be attributed to the fact that the server i*
*s in-
terpreting the record comparisons. Note that for the two client test, the datab*
*ase
is essentially doing twice the work. Running the two tests under the uni version
takes twice the figure quoted for the uni version. If you really want high per-
formance concurrent database management for specialized applications, it would
be theoretically possible to compile the comparison code into a specialized ser*
*ver
which can just do the record comparisons you need.
The other thing to note on the performance front is that we have not opti-
mized the code at all (ie -O6) - there is reason to believe that this could imp*
*rove
things. We have not profiled the multi stuff at all, so there are many potential
improvements in performance there. The next release will be a little more fleet
footed on that front I expect.
6.5 Compiling the Concurrent version
The target multi exists in the Makefile. Typing gmake multi after you make
already built the main system will perform the multi compilation. The target
DiamondBase_____________________________________________________________31_
will build the dbmulti library, and the jeweler server process. It has been tes*
*ted
under SunOS 4.1.1, Ultrix IRIX and Linux. There are some prototypes for the
IPC system calls which seem to be woefully under prototyped on non Linux
systems. It should theoretically work on any UN*X system which supports IPC
(interprocess communication) - although my experience so far suggests that IPC
is about as standard as curses so I am prepared for bug reports.
6.6 Testing the concurrent version
I will distribute a small application called bump. It creates 5000 records, int*
*er-
spersing requests for old records in the addition calls, and ensuring their int*
*egrity.
It takes a parameter indicating a range. If you are running more than one, you
should specify parameters about 10,000 apart. I have run up to 8 of them con-
currently under SunOS, and Linux. Note that if you are running more than one
- you should provide parameters spaced at 10000 and run no test twice without
first creating an empty database, otherwise duplicate records will result and t*
*he
test code will fail. Running more than 9 processes at once causes the tests to *
*fail
due to limits within the server (which can be changed if desired).
6.7 Using the Concurrent version
One of the issues we would still like to address more thoroughly is that of er-
gonomics from a programming point of view. Ideally, we would like to minimize
the pain of the transition between a uni-database and a concurrent one, to that*
* of
linking with a separate library. We still believe that this can be achieved, bu*
*t at
the moment, it is necessary to recompile your application with the preprocessor
symbol 'MULTI' defined. You must also link with the libraries dbmulti and dia-
mond in that order, rather than just diamond. Upon execution, your application
will attempt to connect to a running server in order to perform its transactions
rather than doing the accesses directly. So, to reiterate the steps:
fflMake sure that your code includes diarel.h rather than dbase.h to get at
any database definitions.
fflPut -DMULTI into your application CFLAGS
fflLink your application with -ldbmulti -ldiamond
fflEnsure that the server process jeweler is running.
Note: This version of the code creates all rendezvous points with octal per-
mission 600, meaning that the client and server must be run by the same user.
The permissions can be edited to suit your taste. The whole area of security
management needs careful examination.
32_________________________________________________Multiuser_DiamondBase___
The ergonomics of files used to manage the connection are also less than
satisfactory at the moment. Any relation which you wish to use via a server
process must be in the server.db file in the directory in which the server proc*
*ess
jeweler is run. This file is in the same format as the relation file whose name*
* is
normally passed to the diamondBase object in the uni-version.
The server and client processes initially make their rendezvous via message
passing, and the filename used to make the connection is gibraltar. (It seemed *
*like
a solid place to build a database - sorry). Currently, both the server and clie*
*nt
library code look in the current directory for this file. It must therefore ex*
*ist -
although its contents are irrelevant for the running of the server. The server *
*itself
creates the file when it is run. It also constrains the clients to run from the*
* same
directory as the server. This is messy and we will endeavor to rectify it. If y*
*ou
have strong opinions on how databases, servers and clients should be managed
system-wide, we would love to hear from you.
So, in order to run an application/server pair:
fflcreate a file server.db listing your relations
fflrun jeweler (optionally in the background)
fflrun your application in the same directory
When you wish to shut the server down, either send it an interrupt message
(press control-C), or kill -SIGTERM the process. The server process will output
some messages on either stdout or stderr, so you may wish to log these to a fil*
*e -
they contain some useful information like diagnostics when relation names can't
be looked up.
If there are clients still attached, then you will have to wait until those*
* clients
are gone before the server will shut itself down.
6.8 Caveats - PLEASE READ THIS CAREFULLY.
This is the section where I warn you about the nasty issues/problems which are
still out there and not dealt with. As mentioned above, there are many ergonomic
issues which we will address as we write some more concurrent applications. I
will list some other problems which I am aware of. I will try to fix the more
serious ones at least - but there are just so many little potential problems th*
*at
this may take a while. There has been some pressure to get this out and testable
as soon as possible. My little list of problems follows:
fflVariable length strings and memory aren't supported yet. [I expect to ha*
*ve
these working very soon. Kev]
fflThere is no configuration management, utilities to shut the server down,
or monitor it etc. Changes to the server.db file won't take place until i*
*t is
DiamondBase_____________________________________________________________33_
restarted. At the moment, sending a suggestive signal like SIGTERM or
SIGHUP will cause it to exit when convenient.
fflThere is a static limit on the size of relation which can be handled (4K *
*at
the moment). Getting around this efficiently is a little sticky.
fflThe rendezvous point between client and server is clumsy and essentially
means they need to be run in the same directory. Configuration at in-
stallation time is one possibility, or at runtime which is probably more
acceptable.
fflThe schema compiler (DSC) is still a uni application, so it can blow away
the database from under the server's feet if you are stupid enough. This
shouldn't be permitted.
fflNon-vital commands in diaRel like verStr aren't implemented yet.
fflDeath of server will freeze the application using it.
fflSecurity is reliant on protecting the gibraltar file, and the code opens *
*the
IPC services as 600 currently, so only processes under the same uid as the
server can access it. This needs to be greatly improved. Ideally, client
instances should not be able to interfere with one another. We may have to
resort to encryption to achieve this and there will be performance penalti*
*es.
Denial of service attacks are also possible.
Just remember that if I have missed something, Use The Source Luke. The
files gib.cc and dclient.cc contain the relevant source for the server and clie*
*nt
library respectively. They aren't too complicated.
6.9 Still to do
fflReduce the size of the caveat list.
fflProfile the code to remove bottlenecks
fflReplacing the linked lists with hash tables for the client and object lis*
*ts.
fflUsing semaphores instead of message queues for efficiency
fflAn OS/2 port.
fflMore thorough test suite.
fflAdd dbString support.
fflPort/test on other unix platforms.
fflGo to bed.
34_________________________________________________________DSC_Users_Guide_
7 DSC Users Guide
The DiamondBase Schema Compiler (DSC) is a utility provided with the Di-
amondBase database package which takes a lot of the hard work out of writing
database applications. It does this by generating most of the code required to
maintain the database automatically. In addition DSC generates new, empty
databases for each schema you specify.
Throughout this section and the next, it is expected that the reader has a
good understanding of the concepts involved in relational database design[1].
7.1 A Simple Record Library
Let us begin with a small example. Consider the application of producing a
database for storing record library information. One part of such a database
would be a relation which stores information about a single artist. The informa-
tion we would like to store is the artist's name and the artist's abbreviation.*
* The
specification of this relation is given below.
relation Artist
f
// Field specification.
char artName[100]; // Artist's Name.
char abbr[10]; // Artist's Abbreviations.
long titles = 0; // Number of titles.
unique artId; // A unique number to identify
// this artist by.
// Index specification
index name on artName; // An index for artist names
index abbrev on abbr; // An index for artist
// abbreviations.
// Constructors
construct using titles,artName index name;
construct titles;
g
To make starting your DiamondBase application as easy as possible, the
specification of a schema is based on the specification of data structures in t*
*he C
or C++ languages. As such some of the schema will be obvious, however before
continuing we will look at the important parts it.
DiamondBase_____________________________________________________________35_
fflrelation Artist
The first word here is a keyword, indicating that this is the start of a new
relation specification. The second is the name for the relation. This will beco*
*me
the name of the class associated with this relation, thus becoming a type name.*
* If
you wanted to use "Author" as a variable name in your program, consider calling
the relation something else in the schema.
Note also the opening "brace" (`f'). The information belonging to this relat*
*ion
is contained within these braces. You must finish the relation with a closing
"brace" (`g').
There is no limit to the number of relations which can appear in a single
schema file. Each must begin with the keyword relation and must be enclosed
within braces.
There can be an optional "called name" after the relation and before the
brace. It allows the relation to have a different name to the underlying struct*
*ure.
7.1.1 Field Specification
fflchar artName[100];
Now we begin the business of specifying the fields for the relation. First t*
*he
type of the field is given, in this case a character (or as we shall see, an ar*
*ray
or characters). Next a name must be given to the field. In this case the name
is "artName". You will be able to access a class member called "artName" to
manipulate this data in the programs you write.
Lastly, you have the option of specifying that more than one "type" (in this
case character) should be allocated. This is the syntax for specifying an array*
* of
data items, and in the case of characters is also the method for declaring stri*
*ngs.
Default arguments can be given using an = sign. These are assigned to the
relation during construction.
The valid types for use in a relation are given in the DSC Reference Manual.
Note that each field specification must be terminated by a semi-colon (`;').
fflunique authId;
The unique type is special, in that it is never assigned a value by Diamond-
Base applications. Instead DiamondBase assigns a value to these fields when
a new record is created, guaranteeing that the value it gives is unique for that
relation at that time.
It is important to note that these unique numbers are only instantaneously
unique. If a record is deleted from the relation, the unique value it once had *
*is
then free to be re-used.
36_________________________________________________________DSC_Users_Guide_
7.1.2 Index Specification
Indicies are used as mechanisms for locating records within the relation based
on some sorting field or key. As such their creation does not involve allocati*
*ng
any additional storage within a relation. Instead DSC generates the appropriate
code to generate any keys which are required dynamically. The examples of index
specification above are similar, so the first will be used as an example.
fflindex name on artName;
The first word is the index keyword. This is followed by an identifier. Thi*
*s is
the name of the index and can be used in application code when specifying which
index should be used for record queries 1.
The name of the index is followed by another keyword on, and then a list of
one or more identifiers, seperated by commas. These identifiers are the names of
the fields which should be used as sort keys for the index and should be specif*
*ied
in the order of importance. For example,
index myindex on myfield1, myfield2;
would cause a b-tree to be created with myfield1 as a primary sorting key, and
myfield2 as a secondary sort key.
7.1.3 Constructor Specification
Constructors may be declared for your relation. You should make sure that you
do not create ambiguous constructors. Your compiler will give an error if you
do. You should also not use a constructor that takes a char* and a long in that
order. Such a constructor exists already so that you can construct using a key
and an index number.
fflconstruct using titles,artName index name;
This creates a constructor that accepts a long and char* and then does a get
using the name index. The "index name" portion is optional.
7.2 Compiling a Schema
After creating a schema you will want to compile it using DSC. This process
may generate a number of files, which by default will be created in the current
directory. For large databases which is inconvenient from a programming point
of view, if not simply messy. So as an example, you might set up a directory
structure in your project path which had subdirectories called schema (for stor*
*ing
schema descriptions) and dbfiles (for storing the generated database files).
______________________________
1In fact the name which is made available to application programmers if pref*
*ixed by
"dbIdx_", so that the index above would be refered to in code as "dbIdx_name".
DiamondBase_____________________________________________________________37_
Your application source files might go in the root directory of your project
path, while schemas are located in the schema path, and the database files
themselves (containing the data) are stored in the dbfiles path.
Let us assume that such a directory structure exists, and that the schema
file used in section 7.1 is in the schema path and is called artist.ds (the .ds
extension denotes a DiamondBase Schema file)2. To compile this schema and
place the database files in the dbfiles path, we use the following command :
Specify the database file path
z_____"______-
dsc I_schema_-z____" O dbfiles artist___-z__"
Specify the schema path The schema filename
This could generate 3 source files in the current path called artist.h, arti*
*st.cc
and artist_s.h.
It then may create a new data file for each relation in your schema (in this
case only one), named after the relation which has a .db extension. This file is
placed in the dbfiles path.
Finally DSC may create one .id file for each index specified in the schema.
This extension is suffixed by a number from 0 to 9, using a different number for
each index specified in the schema. These index files will also be located in t*
*he
dbfiles path.
The database files will only be generated if you use the -D flag and the code
described above will only be generated if you specify the -C flag. This ensures
that you do not overwrite anything.
7.3 Derived relations
It is often the case that you would want to inherit a DSC relation into another
class so that you may augment its functionality. To help with this, there is
another command line argument for DSC.
ffldsc -G base derived file
This will create "file.h" and "file.cc" files containing a class named deriv*
*ed
which has a base class base. The class will appear as a skeleton class based on
your base class.
7.4 Getting on with the Application
You now have everything that is required to start building your own application.
There are a few more command line options which may become useful down the
track as your application develops. These are explained in detail in the next
section.
______________________________
2schema files should have the extension ".ds" for clarity
38_____________________________________________________DSC_Reference_Guide_
8 DSC Reference Guide
8.1 Command Line Arguments
The command syntax for DSC is as follows :
dsc [options] <schema name>
The command line arguments to DSC are shown in figure 1.
If no extension is given for the schema file, the extension .ds is used.
_______________________________________________________________
_ Option _Action _
_________________________________________________________________
_ -C _Create generated source code _
_ _ _
_ -D _Create database files _
_ _ _
_ -G base derived fileG_enerate a derived class _
_ _ _
_ -O < path > _ Place generated database files in < path > _
_ _ _
_ -S < path > _Place generated source code in < path > _
_ _ _
_ -I < path > _Use schema files located in < path > _
_______________________________________________________________ _
Figure 1: Command-line options to DSC
8.2 DSC Data Types
The types available in DSC are similar to those which are available in C++ with
three exceptions, and several ommissions. Table 2 shows all the legal types in a
DSC schema file, their C++ equivalents, their type identifiers and gives a short
description of their properties.
8.3 DSC Syntax Definition
The syntax for schema description is modelled on the C and C++ languages.
There may be one or more relations descripted in a schema file, with each being
preceeded by the relation phrase, and each enclosed in a pair or brackets (`f g*
*').
Table 3 shows the complete syntax for DSC in BNF.
NOTE This BNF is out of date. It is 3am and I have no plans to try
to persuade LaTEXwhat the current BNF looks like. I suggest reading
dsc.y. It makes for many hours of fun for the whole family. Note that
the version below does not include default arguments and constructors
(at least).
DiamondBase_____________________________________________________________39_
___________________________________________________________________________
_ DSC Type _ C++ Type _ Id _ Decsription _
_____________________________________________________________________________
_ long _long int 0_ _ A long integer. The size of this type _
_ _ _ _ _
_ _ _ _will vary, but is usually 32 bits. *
* _
___________________________________________________________________________ *
* _
_ ulong _unsigned long int 1_ _ A long integer which may only _
_ _ _ _ _
_ _ _ _take positive values. _
___________________________________________________________________________ _
_ short _short int 2_ _ A short integer. The size of this _
_ _ _ _ _
_ _ _ _type will vary between compilers. _
___________________________________________________________________________ _
_ ushort _unsigned short int3_ _ A short integer which may only _
_ _ _ _ _
_ _ _ _contain positive values. _
___________________________________________________________________________ _
_ double _double _4 _A double precision floating point _
_ _ _ _ _
_ _ _ _number. _
____________________________________________________________________________
_ float f_loat 5_ _ A single precission floating point _
_ _ _ _ _
_ _ _ _number. _
____________________________________________________________________________
_ money _moneyType _ 6 _ A type defined to store currency _
_ _ _ _ _
_ _ _ _information. _
___________________________________________________________________________ _
_ date _dateType _7 _ A type defined to store date _
_ _ _ _ _
_ _ _ _information. _
___________________________________________________________________________ _
_ char _char _8 _A character, or 8-bit number _
___________________________________________________________________________ _
_ unique _uniqueType _9 _ A type whose value is guaranteed to be _
_ _ _ _ _
_ _ _ _instantaneously unique. _
___________________________________________________________________________ _
_ dbString _dbString _10 _A resizable string class _
___________________________________________________________________________ _
_ dbData _dbData _ 11 _ A generic resizable binary data object _
___________________________________________________________________________ _
_ ichar c_har _12 _A case insensitive char _
___________________________________________________________________________ _
Table 2: Valid types and their integer identifiers
40_____________________________________________________DSC_Reference_Guide_
_______________________________________________________
_ schemaFile ! schemaFile relation _
_ _
_ ! null _
_ _
_ _
_ _
_ relation ! relation ident structName fieldList _
_ _
_ _
_ _
_ structName ! [is] called ident _
_ _
_ ! null _
_ _
_ _
_ _
_ fieldList ! f fieldList1 indexList g _
_ _
_ _
_ _
_ fieldList1 ! fieldList1 type ident size ; _
_ _
_ ! null _
_ _
_ _
_ _
_ type ! short _
_ _
_ ! ushort _
_ _
_ ! long _
_ _
_ ! ulong _
_ _
_ ! float _
_ _
_ ! double _
_ _
_ ! char _
_ _
_ ! money _
_ _
_ ! date _
_ _
_ ! unique _
_ _
_ ! dbString _
_ _
_ ! dbData _
_ _
_ _
_ _
_ size ! [ number ] _
_ _
_ ! null _
_ _
_ _
_ _
_ indexList ! indexList indexSpec ; _
_ _
_ ! null _
_ _
_ _
_ _
_ indexSpec ! indexSpec2 , ident _
_ _
_ ! null _
________________________________________________________
Table 3: BNF Syntax for DSC
DiamondBase_____________________________________________________________41_
8.4 File Formats
DSC is responsible for creating only the header for the actual .db file associa*
*ted
with each relation - the creation of b-trees and storage of data in the .db fil*
*e are
delegated out to other parts of DiamondBase . Consequently this section deals
only with the header information for the data file.
8.4.1 List Headers
Each data file contains enough information to derive the structure of each rela*
*tion,
including the names of fields and the list of indicies. The information is brok*
*en
into two parts - the field information and the index information. This informat*
*ion
is encapsulated in two classes, the fieldList and indexList classes. These clas*
*ses
are given in figure 2 and figure 3.
struct fieldList
f
int numFields; // The number of fields for this
// relation
fieldInfo fields; // A linked list of individual field
// information
fieldList() ffields=0;numFields=0;g // Initialise class members
fieldList() fdelete fields;g // Destroy the list
friend ostream& operator o (ostream&, fieldList&)
friend ofstream& operator o (ofstream&, fieldList&)
friend ifstream& operator AE (ifstream&, fieldList&)
g
Figure 2: Class description for the Field Lists
As you will see these are only place keepers for the heads of two linked lis*
*ts.
Since they are both essentially the same only the first will be discussed.
Firstly the number of entries in the linked list is stored to make reading t*
*he
list from a file easier. The only other data member is a pointer to the head of
the list.
Function-wise there is a constructor which simply sets it's data members to
zero and a destructor which causes the memory allocated for the list to be dele*
*ted.
Finally there are three operators for input/output. The first is an inserti*
*on
operator for normal output streams. This just causes the information stored in
the linked list to be placed on the stream given. The second insertion operator
42_____________________________________________________DSC_Reference_Guide_
struct indexList
f
int numIndicies; // The number of indicies for this
// relation
indexInfo indicies; // A linked list of individual index
// information
// Initialise class members
indexList() findicies=0;numIndicies=0;g
// Destroy the list
indexList() fdelete indicies;g
friend ostream& operator o (ostream&, indexList&)
friend ofstream& operator o (ofstream&, indexList&)
friend ifstream& operator AE (ifstream&, indexList&)
g
Figure 3: Class description for the Index List
writes the binary data to an output file. Lastly there is an extraction operator
for retreiving the binary information from an input file.
8.4.2 List Information
The actual information for each field or index if stored in the structures shown
in figure 4 and figure 5. There is little similarity between the two, so both m*
*ust
be discussed.
fflThe fieldInfo Structure
Field information is located in the fieldInfo structure. These are chained
together in a linked list, the head of which is stored in the fieldList structu*
*re.
The structure contains the name of the field (as a character array), the ty*
*pe
of the data stored in that field (as an integer. The list of valid types and th*
*eir
identifiers can be found in section 8.2), the size of the type in bytes and a p*
*ointer
to the next fieldInfo structure in the list.
The constructor for the class creates a new fieldInfo object with the values
passed to it as parameters.
The destructor deletes the space allocated for the field name and then dele*
*tes
the next fieldInfo structure in the list, if it exists.
fflThe indexInfo Structure
DiamondBase_____________________________________________________________43_
The indexInfo structure really only differes in the information it stores. T*
*here
is the index type (not currently used), an array of fields which are used in the
index (listed in decreasing sort precedence), and a pointer to the next indexIn*
*fo
structure.
8.4.3 Database File Headers
The structure of the file is shown in figure 6.
struct fieldInfo
f
int fldType;
int fldSize;
char fldName;
fieldInfo nextFld;
fieldInfo(char name, int ty, int sz)
f
fldName = new char[strlen(name)+1];
strcpy(fldName,name);
fldType = ty;
fldSize = sz;
nextFld = 0;
g
fieldInfo()
f
delete fldName;
if (nextFld)
delete nextFld;
g
friend ostream& operator o (ostream&, fieldInfo&);
friend ofstream& operator o (ofstream&, fieldInfo&);
g;
Figure 4: Structures for storing field information
44_____________________________________________________DSC_Reference_Guide_
struct indexInfo
f
int idxType;
int idxFields[MAX_FIELDS_IN_INDEX];
indexInfo nextIdx;
indexInfo(int type, TIndicies indicies )
f
idxType = type;
for(int i=0;i<MAX_FIELDS_IN_INDEX;i++)
f
idxFields[i] = indicies[i];
g
nextIdx = 0;
g
indexInfo()
f
if (nextIdx)
delete nextIdx;
g
friend ostream& operator o (ostream&, indexInfo&);
friend ofstream& operator o (ofstream&, indexInfo&);
g;
Figure 5: Structures for storing index information
DiamondBase_____________________________________________________________45_
_______________________________
_ Size of header _
_ _
_ (4 bytes) _
_________________________________
_ fieldList struct _
_ _
_ (8 bytes) _
_______________________________ _
_ 1 or more fieldInfo structures _
_ _
_ (16 bytes each) _
_________________________________
_ indexList structure _
_ _
_ (8 bytes) _
_______________________________ _
_ 0 or more indexInfo structures _
_ _
_ (12 bytes each) _
_________________________________
_ Data records _
_ _
_ : _
_ _
_ : _
_ _
_ : _
_ _
_ : _
_______________________________ _
Figure 6: File structure for database files.
46______________________________________________________________File_formats_
9 File formats
9.1 Database Config file
This file just specifies the directory where the files related to a particular *
*relation
reside. It contains lines in the form:
relationName=directory-path
9.2 recServer
The record server is a generic block storage system. Upon creation, a record si*
*ze
is nominated, and space for 4096 records is allocated. The object inheriting a
recServer can then request a new record id, store the record at that location or
delete the record at that location.
The actual layout of the file is documented in the next section. The file
basically consists of a header followed by alternate bit pools and record stora*
*ge
areas. The file will expand in size to accomodate demand for records. It is
currently unable to shrink as demand for records decrease. There are ways of
implementing this but they are currently not used.
When the database becomes 80% full, the number of free slots is increased by
25% or 1 - whichever is bigger. To keep the file size to a minimum, the record
space after the last bitPool isn't grabbed from the file system until records a*
*re
actually stored there.
Since the recServer is designed to be used as a storage facility for other *
*pro-
gram modules, it has an offset which can be supplied at construction to reserve
a certain number of bytes at the beginning of the file for other purposes.
9.2.1 Header
The header keeps track of the current status of the database. Namely the size of
each record, the number of active records and the total storage capacity of the
database in records.
9.2.2 Bit Pools
Bitpools are essentially bitmaps which record which records are used in the fol-
lowing record storage area. It also has a header to record how many bits are
used. This allows a quick scan of bits to determine which record slots are empty
in the following record storage area. Bitpools are currently a fixed size of 40*
*96
slots - which are bit packed to occupy 512 bytes.
DiamondBase_____________________________________________________________47_
9.2.3 Record Pools
The record pools are a section of the file which is 4096 * lengthOfRecord long.
There is no flag on the actual record to indicate whether it is used or not - t*
*his is
simply a storage area. The preceding bitPool must be interrogated to determine
the status of a particular bitPool.
9.2.4 bTree bucket layout
The bTree is where the real magic in the database happens. For a detailed view -
we suggest reading the comments in bucket.cc. The bTree consists of a header
and a data area. The header has pointers to the previous, next and parent bucke*
*ts
as well as a flag to indicate interior or leaf node. Note that the parent poin*
*ter
may be incorrect if you were in the right hand side of the parent bucket when it
split. These pointers are repaired during tree traversal for efficiency reasons*
*. It
also stores the number of active keys in the data area.
The data area consists of alternating pointer key pairs with one more pointer
than key. The pointers are long integers which either point to child buckets (in
an interior node) or the record in the recServer at a leaf node. The key data is
long word aligned to avoid nasty problems during key comparison. This makes
the index storage larger than necessary but is faster. Intel architectures coul*
*d be
optimized as the processor is alignment friendly.
The bTree inherits a recServer to store its actual buckets. There is a global
header before the recServer header which records the unaligned length of the key
and where the root bucket is.
I refuse to describe the full bTree manipulation details. I can recommend
Smith, "Data Structures and Algorithms" for a good discussion. If you want to
know how ours works - read Kevin's code. If there are sufficient enquiries we w*
*ill
describe - when we have looked at the code again.
9.3 Variable Length Fields
The format of the .str file used by the memServer for variable length fields (t*
*ypes
dbData and dbString) are described in section 5.
48____________________________________________________________Miscellaneous_
10 Miscellaneous
10.1 ToDo
fflImplementing multi-user version.
This is very high priority - and will probably involve some use of shared
memory and/or a central locking process.
This has been implemented for a single machine, multi client model.
fflImplementing duplicate records
Not likely! We've come to the conclusion that the inclusion of a unique
field in an index gives exactly what ducplicate indexes gives you except *
*the
semantics are far better.
fflA query language
fflBtree reconstruction facility
fflRecord dump facility This now exists in the db2txt program. The recon-
struction facility will be the reverse of this once written. It is next o*
*n the
agenda.
fflMonitoring facility to count database operations
fflDatabase browser
This has been made easier by the recent addition of diaGRel.
fflInteractive relation creation
fflEliminate bTree indexing of completely unique fields
fflBetter fan-out specification in database
The bTree buckets are fixed size currently - and will take as many keys as
will fit. The number of keys should be set and the size of the bTree buck*
*et
varied.
fflefficient bTree deletion
bTree nodes aren't merged when they drop below half full. They should be
for efficiency reasons.
fflDatabase update tool
When adding fields to a database, it would be nice to be able to upgrade
the existing database automatically. This has to be done by hand currentl*
*y.
DiamondBase_____________________________________________________________49_
10.2 Profiling data
We include some sample profiling data for those speed freaks out there who want
to improve performance. We found some very interesting things after our initial
profiling experiments - like the fact that our memory checking library had 90%
of the CPU time.
The bcopy function was also overused quite a bit and this led to far too many
function calls where much better alternatives were available. Use of bcopy has
been reduced to a minimum.
We have attempted to speed up all the very frequently used functions and
eliminate unnecessary calls to them.
Kevin is to provide this bit. [Well, I will one day when I can collect all t*
*he
data and get it into some format that might be vaguely useful for something
or someone and possibly formatted in a way that could fit here, in addition to
making this sentence shorter in some way or another].
References
[1]C.J. Date, An Introduction to Database Systems, Volume 1, Fifth Edition,
Addison-Wesley, 1992, pp 245-398.