Shareware Overload

home *** CD-ROM | disk | FTP | other *** search

/ Shareware Overload / ShartewareOverload.cdr / progm / cbase.zip / CBASE10B.ZIP / GUIDE.TXT < prev next >

Wrap

Text File | 1989-11-22 | 56KB | 1,511 lines

CBASE The C Database Library Citadel Brookville, IndianaCopyright 1989 by Citadel. All rights reserved. Citadel 241 East Eleventh Street Brookville, IN 47012 317-647-4720 BBS 317-647-2403 Version 1.0b This manual is protected by United States copyright law. No part of it may be reproduced without the express written permission of Citadel. Technical Support Technical support is available between 10 a.m. and 4 p.m. EST. When calling for technical support, please have ready the following information: . product name, version, and serial number . operating system and version number . C compiler and version number . computer brand and model The Citadel BBS is available 24 hours a day. Users of Citadel products are encouraged to make use of this resource. UNIX is a trademark of AT&T. MS-DOS is a trademark of Microsoft Corporation. Turbo C is a trademark of Borland International, Inc. Contents Chapter 1. Introduction 1 Chapter 2. Database Basics 3 2.1 Sequential File Structures 2.2 Inverted Files 2.3 B-trees Chapter 3. cbase Library Functions 7 3.1 Access Control Functions 3.2 Lock Functions 3.3 Record Cursor Position Functions 3.4 Key Cursor Position Functions 3.5 Input/Output Functions 3.6 Data Import/Export Functions Chapter 4. An Example Program 15 4.1 Data Definition 4.2 Opening a cbase 4.3 Locking a cbase 4.4 Accessing a cbase 4.5 Closing a cbase 4.6 Manipulating Exported Data Appendix A. Installation Instructions 27 A1 The blkio Library A2 The btree Library A3 The cbase library A4 The lseq Library Appendix B. Defining New Data Types 31 B1 The Type Name B2 The Comparison Function B3 The Export and Import Functions B4 The Type Count Appendix C. Porting to a New Operating System 35 C1 The HOST macro C2 The File Descriptor Type C3 File Access System Calls C4 File Locking System Calls References 39 Chapter 1: Introduction cbase is a C database file management library. Records may be accessed both randomly and sequentially through indexes stored in B+-trees. Records may also be accessed sequentially in the order in which they are stored. Multiuser access is supported under any operating system with file locking capabilities. cbase is designed to be extremely portable. It is written in strict adherence to the ANSI C standard, and can still be used with K&R C compilers. All system dependent code is isolated in order to make it easy to port to new operating systems. Many of the operations performed by cbase internally represent independently useful tools, and the software has been designed with this in mind. cbase actually comprises four individual libraries, each complete and independently accessible. Figure 1.1 shows these libraries and their relationships. ┌─────────────────────────────────┐ │ cbase │ └───────┬─────────────────┬───────┘ ┌───────┴───────┐ ┌───────┴───────┐ │ lseq │ │ btree │ └───────┬───────┘ └───────┬───────┘ ┌───────┴─────────────────┴───────┐ │ blkio │ └─────────────────────────────────┘ Figure 1.1. cbase and Underlying Libraries At the foundation of cbase is the blkio (block buffered i/o) library. blkio is a buffered i/o library very similar to stdio but based on a file model more appropriate for structured files such as used in database software (see FROS89). While stdio models a file as an unstructured stream of characters, blkio models a file as a collection of blocks made up of fields. All file access and buffering is done using blkio. All operating system specific code is isolated within a small portion of this library. The lseq (linked sequential file) library provides all the facilities necessary for the creation and manipulation of doubly linked sequential files. The btree (B-tree) library provides the same for B+-tree files. The cbase library uses lseq and btree to perform all structured file management operations. The lseq library is used for record storage and the btree library for inverted file key storage. When using a particular library, all operations are performed with functions provided by that library. No references need be made to underlying libraries. When using the cbase library, it is therefore not necessary to know the functions included in the other libraries. Chapter 2: Database Basics This chapter describes some of the basic database concepts relevant to cbase. It is intended to be a brief and readily accessible introduction, and can be easily skipped by one already familiar with database fundamentals. References are given where in-depth discussions may be found. 2.1 Sequential File Structures One of the simplest file organizations is the physical sequential file. In this organization, records are simply written one after the other and sequential access is done in the order in which the records are physically stored. The physical sequential file works very well in cases where the data is static, but it is extremely inefficient when the data is dynamic. Consider a file containing 100,000 records stored in a sorted order. The insertion of a single record at the beginning of this file would result in 100,000 records being moved to make room at the beginning of the file for the new record. This identical problem also occurs when large ordered lists are stored in memory, and the same solution, the linked list, can be used for files (see pp. 106-12 of HORO76 for an explanation of linked lists in memory). In a linked list, the record order is determined by pointers stored with each record, not by the physical record locations; a record logically located at the beginning of a file can be physically located anywhere within the file. In a singly linked list each record has only a pointer to the next record and so the file can be traversed in just one direction. In a doubly linked list, each record has both a next and a previous pointer, allowing bidirectional access. 2.2 Inverted Files Assume a data file containing member records for an organization, and that the record format for this file is typedef struct { long number; char name[24]; char address[81]; char city[24]; char state[3]; char zip[11]; } member_t; Assume further that the data file has a physical sequential file structure, and that the records are sorted by the number field, therefore a binary search may be performed to quickly find a record with a given number field. The field on which the records are sorted is called the primary index. The problem arises when a query is made on another field besides the primary index, in which case a search of the entire data file is required. The problem of being able to perform queries efficiently on more than one field can be resolved through the use of inverted files. For each secondary index exists an inverted file containing an entry for each record in the data file. The inverted file for the name field, for example, would have entries containing the name field and record position, and these would be sorted by the name field. To find a record with a given value in the name field, the inverted file for the name field would be searched. The record position paired with the specified name value would then be used to locate the record in the data file. See pp. 531-533 of HORO76 and pp. 75-78 of ULLM82 for more information on inverted files. 2.3 B-trees As discussed above, the physical sequential file provides unacceptable performance for dynamic data. For static data, however, it provides efficient sequential as well as random access (using a binary search). The linked sequential file solves the problem of performance for dynamic data while preserving efficient sequential access, but the ability to randomly access records is lost. The need to store dynamic data which may be randomly accessed led to the development of the B-tree. As the name implies, the B-tree stores data in a tree structure, which allows random access. And since the tree nodes are connected by pointers, the update problem is solved in the same way as by the linked sequential file organization. The basic B-tree does not provide very efficient sequential access, however. This led to a B-tree variant called the B+-tree, which provides both efficient random and sequential access. While the B+-tree does come close to providing the best of all worlds, it has two important drawbacks. First, there is significantly more storage overhead for a B+-tree than for a sequential file. Every entry in a B+-tree is stored in a leaf node, and the rest of the tree is simply scaffolding used when the tree is searched. Second, each entry in any type of B-tree must be unique. These characteristics make B-tree file organizations inappropriate for data files containing records, which may be both large and duplicated. But for inverted files, which generally have relatively short entries that are by their very nature unique (because the file position of each record in the data file would be unique), the B+-tree is an ideal choice. cbase therefore uses the linked sequential file organization for record storage and B+-trees for inverted files. The mechanics of B-trees is beyond the scope of this chapter. A good discussion of the B-tree and its major variants may be found in COME79. Chapter 3: cbase Library Functions 3.1 Access Control Functions The cbcreate function is used to create a new cbase. int cbcreate(const char *cbname, size_t recsize, int fldc, const cbfield_t fldv[]); cbname points to a character string which is the name of the cbase. This name is used as the name of the data file which holds the records in the cbase. recsize specifies the record size to be used. fldc is the number of fields in the cbase, and fldv is an array of fldc field definition structures. Each structure in the array contains the definition for one field. field definitions must be in the order in which the fields occur in the record. The field definition structure type cbfield_t is defined in <cbase.h>. typedef struct { /* field definition */ size_t offset; /* field offset */ size_t size; /* size of field */ int type; /* type of field */ int flags; /* flags */ char filename[FILENAME_MAX + 1]; /* data file name */ } cbfield_t; offset is the location of the field within the record and size is the size of the field. type specifies the field data type, legal values for which are shown in Table 3.1. flags values are constructed by bitwise ORing together flags from the following list. CB_FKEY Field is to be a key. CB_FUNIQ Only for use with CB_FKEY. Indicates that the key is constrained to be unique. FILENAME_MAX is an ANSI C definition indicating the maximum length of a filename string. It is defined in <stdio.h>. t_char signed character t_charv signed character array t_uchar unsigned character t_ucharv unsigned character array t_short signed short integer t_shortv signed short integer array t_ushort unsigned short integer t_ushortv unsigned short integer array t_int signed integer t_intv signed integer array t_uint unsigned integer t_uintv unsigned integer array t_long signed long integer t_longv signed long integer array t_ulong unsigned long integer t_ulongv unsigned long integer array t_float floating point t_floatv floating point array t_double double precision t_doublev double precision array t_ldouble long double t_ldoublev long double array t_pointer pointer t_string character string t_binary block of binary data (e.g., graphics) Table 3.1. cbase Data Types Before a cbase that has been created may be used, it must be opened using the function cbase_t *cbopen(const char *cbname, const char *type, int fldc, const cbfield_t fldv[]); cbname, fldc, and fldv serve the same functions as for cbcreate and must be given the same values as when the cbase was created. type points to a character string specifying the type of operations for which the cbase is to be opened (as for the stdio function fopen). Legal values for type are "r" open for reading "r+" open for update (reading and writing) cbopen returns a pointer to the open cbase. The cbsync function causes any buffered data for a cbase to be written out. int cbsync(cbase_t *cbp); The cbase remains open and the buffers retain their contents. After processing is completed on an open cbase, it must be closed using the function int cbclose(cbase_t *cbp); The cbclose function causes any buffered data for the cbase to be written out, unlocks it, closes it, and frees the cbase pointer. 3.2 Lock Functions Before an open cbase can be accessed, it must be locked. The function used to control the lock status of a cbase is int cblock(cbase_t *cbp, int ltype); where cbp is a pointer to an open cbase and ltype is the lock type to be placed on the cbase. The legal values for ltype are CB_RDLCK lock cbase for reading CB_WRLCK lock cbase for reading and writing CB_RDLKW lock cbase for reading (wait) CB_WRLKW lock cbase for reading and writing (wait) CB_UNLCK unlock cbase If ltype is CB_RDLCK and the cbase is currently write locked by another process, or if ltype is CB_WRLCK and the cbase is currently read or write locked by another process, cblock will fail and set errno to EAGAIN. For the lock types which wait, cblock will not return until the lock is available. The cbgetlck function reports the lock status held by the calling process on a cbase. int cbgetlck(cbase_t *cbp); It returns one of the legal values for the ltype argument in the cblock function. 3.3 Record Cursor Position Functions Each open cbase has a record cursor. At any given time the record cursor is either positioned on a record in that cbase or on a special position called null. The record on which the cursor is located is referred to as the current record. The operations performed by most cbase functions are either on or relative to the current record, so the initial step in a transaction on a cbase is usually to position the record cursor on the desired record. When accessing the records in a cbase in the order in which they are stored, the following functions are used to move the record cursor: int cbrecfirst(cbase_t *cbp); int cbreclast(cbase_t *cbp); int cbrecnext(cbase_t *cbp); int cbrecprev(cbase_t *cbp); The cbrecfirst function positions the record cursor to the first record, and cbreclast to the last record. Before calling either of these functions cbreccnt should be used to test if the cbase is empty. unsigned long cbreccnt(cbase_t *cbp); If the cbase is empty, there is no first or last record and so these functions would return an error. The cbrecnext function advances the record cursor to the succeeding record, and cbrecprev retreats it to the preceding record. In the record ordering, null is located before the first record and after the last. There are also functions for saving the current position of the record cursor and resetting it to that position: int cbgetrcur(cbase_t *cbp, cbrpos_t *cbrpos_p); int cbsetrcur(cbase_t *cbp, const cbrpos_t *cbrpos_p); The cbgetrcur function gets the current position of the record cursor and saves it in the variable pointed to by cbrpos_p. cbrpos_t is the cbase record position type, defined in <cbase.h>. cbsetrcur can then be used later to set the record cursor back to that position. The record cursor can be positioned on null by passing cbsetrcur the NULL pointer rather than a pointer to a variable. Other than this special case, cbsetrcur should only be called with record cursor positions previously saved with cbgetrcur. The cbrcursor macro is used to test if the record cursor for a cbase is positioned on a record or on null. void *cbrcursor(cbase_t *cbp); If the record cursor of the cbase pointed to by cbp is positioned on null, cbrcursor returns the NULL pointer. If it is on a record, cbrcursor returns a value not equal to the NULL pointer. This function is useful for loops which need to test when the last (or first) record has been processed. 3.4 Key Cursor Position Functions Each open cbase also has a key cursor for each key (index) defined for that cbase. As for the record cursor, a key cursor is either positioned on a record in that cbase or on null. To access a cbase in the sort order of a certain key, the appropriate key cursor is used instead of the record cursor. Each key cursor moves independently of the others, but whenever a key cursor position is set, the record cursor is moved to the same record. The key cursors are not affected by moving the record cursor. The following functions are used to move a key cursor. int cbkeyfirst(cbase_t *cbp, int field); int cbkeylast(cbase_t *cbp, int field); int cbkeynext(cbase_t *cbp, int field); int cbkeyprev(cbase_t *cbp, int field); These perform as do the corresponding functions for the record cursor. Note the key cursor functions may be used only with fields defined to be keys. The following function is used to search for a key of a certain value. int cbkeysrch(cbase_t *cbp, int field, const void *buf); field is the key in which to search for the data item pointed to by buf. If a key of that value is not found, the key (and record) cursor is positioned to the record (possibly null) which would follow a record with that key value. Since the key cursors do not automatically follow the record cursor, the situation sometimes occurs where the record cursor is positioned to the desired record, but the cursor for the key to be used next is not. The cbkeyalign function is used to align a specified key cursor with the record cursor. int cbkeyalign(cbase_t *cbp, int field); The reason the key cursors are not updated every time the record cursor moves is not because it would be in any way difficult to do so, but because this would increase the overhead enormously. And since only one key cursor is normally used at a time, this extra overhead would almost never provide any benefit in return. As for the record cursor, each key cursor position can be tested to be positioned on a record or on null. void *cbkcursor(cbase_t *cbp, int field); If the key cursor specified by field of the cbase pointed to by cbp is positioned on null, cbkcursor returns the NULL pointer. If it is on a record, cbkcursor returns a value not equal to the NULL pointer. 3.5 Input/Output Functions To read a record from a cbase, the record cursor for that cbase is first positioned to the desired record using either the record cursor position functions or the key cursor position functions. One of the following functions is then called to read from the current record. int cbgetr(cbase_t *cbp, void *buf); int cbgetrf(cbase_t *cbp, int field, void *buf); cbp is a pointer to an open cbase and buf points to the storage area to receive the data read from the cbase. The cbgetr function reads the entire current record, while cbgetrf reads the specified field from the current record. The functions for inserting a new record into a cbase are int cbinscur(cbase_t *cbp, const void *buf); int cbinsert(cbase_t *cbp, const void *buf); where buf points to the record to be inserted. When a new record is inserted into a cbase, the position it holds relative to each key cursor is defined by the sort order for that key field. There is no predefined sort order associated with the record cursor, however, and it is up to the user whether or not to store the records for each cbase in a sorted or unsorted order. To store records in a sorted order, the record cursor is first positioned the the record after which to insert the new record. cbinscur is then called to insert the record pointed to by buf after the current record. If no sort order is desired, the step to position the record cursor is skipped. cbinsert always inserts a record after the last record in the cbase. The cbdelcur function is used to delete a record. int cbdelcur(cbase_t *cbp); The record cursor must first be positioned on the record to delete, then cbdelcur called to delete the current record. The cbputr function writes over an existing record. int cbputr(cbase_t *cbp, const void *buf); buf points to the new record contents. Writing over an existing record is equivalent to deleting the record and inserting a new one in the same position in the file. 3.6 Data Import/Export Functions cbase data can be exported to a text file using the cbexport function. int cbexport(cbase_t *cbp, const char *filename); Every record in cbase cbp is converted to a text format and written to the file filename. The export file format is defined as follows. . Each record is terminated by a newline ('\n'). . The fields in a record are delimited by vertical bars ('|'). . Each field contains only printable characters. . If a field contains the field delimiter character, that character is replaced with \F. . The individual elements of array data types are exported as individual fields. Data may be imported from a text file using the cbimport function. int cbimport(cbase_t *cbp, const char *filename); cbimport reads each record from the text file filename and inserts it into the cbase cbp. Data import/export is primarily used to move data between different database formats. This sometimes requires some slight rearranging of the text before importing. This can be most easily done using awk, a language designed specifically for manipulating records stored in text files. awk is normally included with UNIX, and is also available for MS-DOS. Chapter 4: An Example Program Included with cbase is rolodeck, an example program illustrating the use of cbase. Rolodeck is a program for storing business cards. To allow it to be compiled without requiring any additional libraries for displays, and because the purpose of the program is purely instructional, the program has been given only a simple user interface. 4.1 Data Definition The first step in writing a program using cbase is to define the data to be stored. This should be done in a separate header file for each record type to be used. Figure 4.1 shows rolodeck.h, the data definition header for the business card record type used by the rolodeck program. Starting at the top, the first thing to note is the definition of a macro to prevent multiple includes. This is a general practice applicable to any header file, cbase or otherwise, whose purpose is to allow a header to be specified for inclusion multiple times in a file while ensuring that the definitions within the file are processed only once. For a header containing nothing but macro definitions, the only effect of preventing multiple includes will be reduced compile times. But for a header containing, for instance, type definitions, multiple includes would produce compilation errors. Notice how the include macro is related to the header file name; the file name is converted to upper case and the period replaced by an underscore ('.' is not a valid character for C identifiers). These macros should be named consistently to minimize the possibility of accidentally defining the same macro elsewhere for some other purpose. At the beginning of every data definition header file, <cbase.h> should be included. Following this is defined the name of the cbase. This character string is the name used for the record file. This macro is to be used for the cbname argument when calling cbcreate and cbopen.#ifndef ROLODECK_H /* prevent multiple includes */ #define ROLODECK_H #include <cbase.h> /* cbase name */ #define RDNAME ("rolodeck.dat") /* record definition */ typedef struct { char rd_contact[41]; /* contact name */ char rd_title[41]; /* contact title */ char rd_company[41]; /* company name */ char rd_addr[2*40+1]; /* company address */ char rd_city[26]; /* city */ char rd_state[3]; /* state */ char rd_zip[11]; /* zip code */ char rd_phone[13]; /* phone number */ char rd_ext[5]; /* phone extension */ char rd_fax[13]; /* fax number */ char rd_notes[4*40+1]; /* notes */ } rolodeck_t; /* field numbers */ #define RD_CONTACT (0) #define RD_TITLE (1) #define RD_COMPANY (2) #define RD_ADDR (3) #define RD_CITY (4) #define RD_STATE (5) #define RD_ZIP (6) #define RD_PHONE (7) #define RD_EXT (8) #define RD_FAX (9) #define RD_NOTES (10) #define RDFLDC (11) /* field definition list */ static cbfield_t rd_fldv[] = { {offsetof(rolodeck_t, rd_contact[0]), sizeofm(rolodeck_t, rd_contact), t_string, CB_FKEY | CB_FUNIQ, "rd_cont.key"}, {offsetof(rolodeck_t, rd_title[0]), sizeofm(rolodeck_t, rd_title), t_string, 0, ""}, {offsetof(rolodeck_t, rd_company[0]), sizeofm(rolodeck_t, rd_company), t_string, CB_FKEY, "rd_comp.key"}, {offsetof(rolodeck_t, rd_addr[0]), sizeofm(rolodeck_t, rd_addr), t_string, 0, ""}, {offsetof(rolodeck_t, rd_city[0]), sizeofm(rolodeck_t, rd_city), t_string, 0, ""}, {offsetof(rolodeck_t, rd_state[0]), sizeofm(rolodeck_t, rd_state), t_string, 0, ""}, {offsetof(rolodeck_t, rd_zip[0]), sizeofm(rolodeck_t, rd_zip), t_string, 0, ""}, {offsetof(rolodeck_t, rd_phone[0]), sizeofm(rolodeck_t, rd_phone), t_string, 0, ""}, {offsetof(rolodeck_t, rd_ext[0]), sizeofm(rolodeck_t, rd_ext), t_string, 0, ""}, {offsetof(rolodeck_t, rd_fax[0]), sizeofm(rolodeck_t, rd_fax), t_string, 0, ""}, {offsetof(rolodeck_t, rd_notes[0]), sizeofm(rolodeck_t, rd_notes), t_string, 0, ""} }; #endif /* #ifdef ROLODECK_H */ Figure 4.1. rolodeck.h Next is the type definition for the record being defined. Each field in the record (member in the structure) begins with a character sequence derived from the cbase name, usually two characters, followed by an underscore. This character sequence should be unique across all the record types used by a given program. This type is for use when declaring record variables. Variables of this type should be used for functions such as cbgetr which require a pointer to a record. Following the record type definition is a list of macros used to identify a field when calling cbase functions. The names of these macros are constructed by taking the field names from the record type definition and converting them to uppercase. This is why the field names must be made unique. Each field name macro must be an integer indicating the number of the field in the record type definition, starting at zero. A macro for the total number of fields is also defined. This macro is to be used for the fldc argument when calling cbcreate and cbopen. The actual data definition is done in the field definition array. The members of this array are of type cbfield_t, described in section 3.1. This array specifies, for each field in the record, its offset in the record, its size, its type, some flags, and a filename to be used if the field is to be a key. The offsetof macro should be used to specify the offset. A macro sizeofm (size of member) is defined in <cbase.h> to give the size of a member of a structure. size_t sizeofm(struct_t, member); The arguments to sizeofm are the same as for offsetof. The size cannot be calculated from the difference between field offsets because padding characters may be added between structure members to maintain proper alignment (see KERN88). The field definition array is to be used as the fldv argument to cbcreate and cbopen. It should be noted that every record type should normally have at least one unique key field. This field is used to uniquely identify records. The physical record position is often used for this purpose, but, as will be seen later in this chapter, a unique key is required for multiuser applications. In addition to the data definition header, all programs using cbase normally require that the following header files also be included. #include <blkio.h> #include <cbase.h> #include <errno.h> #include <stdio.h> #include <stdlib.h> <blkio.h> is the header for the blkio library. It is included to provide the definition of the NULL pointer and the declaration of the function bexit. bexit is for use in place of exit in any program which uses the blkio library for file access. It writes out any buffered data for any open block file then calls exit (see bexit in the blkio reference manual). <cbase.h> is the cbase header file containing all the constant and type definitions and function declarations for using the cbase library. <errno.h> contains the definition of the errno macro as well as definitions of error code values; all cbase functions use errno for error reporting. <stdio.h> is the header for the standard buffered i/o library, and is included for the declarations of functions such as printf and scanf. <stdlib.h> is included for the macros EXIT_SUCCESS and EXIT_FAILURE, which are implementation defined success and failure codes to be used as the argument to exit. <string.h> is included for declarations of the string and memory functions (e.g., strcpy and memset). 4.2 Opening a cbase The first step in accessing an existing cbase is to open it. Figure 4.2 shows the code from rolodeck.c to open the rolodeck cbase. rolodeck is opened with a type argument of "r+" to allow both reading and writing. The other arguments are the cbase name, RDNAME, the field count, RDFLDC, and the field definition list, rdfldv, all defined in the data definition header file, rolodeck.h. On error cbopen returns the NULL pointer. For this program there is only one cbase, but most applications will require several. If the named cbase does not exist, cbopen will fail and set errno to ENOENT. In this example, if the rolodeck cbase does not exist, it is created and the program continues as normal. Note that the cbase must still be opened after it is created. In some cases a separate program is written to create all the cbases required by an application; in this case the main program usually interprets ENOENT as an error and exits. /* open rolodeck cbase */ cbp = cbopen(RDNAME, "r+", RDFLDC, rdfldv); if (cbp == NULL) { if (errno != ENOENT) { fprintf(stderr, "*** Error %d opening rolodeck.\n", errno); bexit(EXIT_FAILURE); } /* create rolodeck cbase */ printf("Rolodeck does not exist. Creating...\n"); if (cbcreate(RDNAME, sizeof(rolodeck_t), RDFLDC, rdfldv) == -1) { fprintf(stderr, "*** Error %d creating rolodeck.\n", errno); bexit(EXIT_FAILURE); } cbp = cbopen(RDNAME, "r+", RDFLDC, rdfldv); if (cbp == NULL) { fprintf(stderr, "*** Error %d opening rolodeck.\n", errno); bexit(EXIT_FAILURE); } } Figure 4.2. Opening a cbase 4.3 Locking a cbase Befor lock; a write locked cbase can be neither read nor write locked by any other process. Write locks are exclusive because, if one process tried to read data while it was partially modified by another, the data would probably be in an inconsistent state. Processes which will only read data, however, can safely proceed to do so concurrently. While a cbase is write locked, other processes needing to access that cbase must wait until it is unlocked so that they can in turn lock it themselves to complete their processing. While a cbase is read locked, only processes needing to write data need to wait. Using a write lock when a read lock would suffice will therefore delay processing unnecessarily. For the same reason, locks of both types should be held for the shortest time possible; a common mistake in writing multiuser applications is to pause for use input while holding a lock on a cbase, which would cause a lock to be held indefinitely. If an attempt is made to obtain a lock on a cbase, but is blocked by a lock held by another process, cblock will fail and set errno to EAGAIN. The call to cblock is therefore usually made in a loop with a predefined maximum number of tries. It is usually most convenient to place this in a function configured suitably for the application being developed. Figure 4.3 shows this function from rolodeck.c. It may also be suitable in some instances to sleep for a short (possibly random) time between attempts to lock. #define MAXLTRIES (20) /* max lock tries */ /* rdlock: rolodeck lock */ int rdlock(cbp, ltype) cbase_t *cbp; int ltype; { int i = 0; for (i = 0; i < MAXLTRIES; i++) { if (cblock(cbp, ltype) == -1) { if (errno == EAGAIN) { continue; } return -1; } else { errno = 0; return 0; } } errno = EAGAIN; return -1; } Figure 4.3. Rolodeck Locking Function There are also two lock types (CB_RDLKW and CB_WRLCKW) which, if the requested lock is blocked, will wait until it can be obtained. These are not usually used, however, because if the lock is not freed in a reasonable time, the process waiting for the lock will be hung. For an applications where there will be only a single process, the necessary locks can be set immediately after opening the cbases to be accessed. One critical concern when locking multiple cbases is the possibility of deadlock. Deadlock is an extensive subject, and there are a number of ways of ways of dealing with it. Most texts on operating systems (see CALI82) and database theory cover the subject in detail. 4.4 Accessing a cbase The gross structure of the rolodeck program is a case statement within a loop. At the start of the loop, a user request is read, the value of which is used to select the action performed in the case statement. Each individual action performed in the case statement illustrates the use of cbase to perform a basic operation, e.g., inserting a record, deleting a record, finding the next record, exporting data to a text file, etc. The operation of finding the next record serves as a good general example. The code for this from rolodeck.c is shown in figure 4.4. One of the most important points to notice in the example code is that a unique key (the contact name, here) is used to relocate the current record when a cbase is locked. This is because, when a cbase is unlocked, it may be modified by another process. A record at a given location may be deleted, and the empty slot possibly reused for a new record. Because of this, cbsetrpos cannot be used with a record position obtained during a previously held lock. As mentioned in the section 4.3, applications which do not involve multiple processes accessing the same data can simply lock a cbase and leave it locked rather than locking the cbase immediately prior to each transaction. Another central point is the use of multiple keys. In the rolodeck program, both the contact and the company names are keys. The variable sf is used to identify the current sort field (which can be changed interactively). Before using the cbkeynext function, the appropriate key cursor must first be positioned. cbkeysrch positions only the key on which the search is made, which here is the unique key. If the next card is to be found using the sort order of a different key, cbkeyalign must first be used to align that key cursor with the current record. The reason that all the key cursors do not automatically follow the record cursor is that the overhead would be considerable, and usually only one key is used at a time. case RD_REQ_NEXT_CARD: /* next card */ rdlock(cbp, CB_RDLCK); if (cbreccnt(cbp) == 0) { printf("The rolodeck is empty.\n\n"); rdlock(cbp, CB_UNLCK); continue; } /* use unique key field to set record cursor */ found = cbkeysrch(cbp, RD_CONTACT, rd.rd_contact); if (sf != RD_CONTACT) { /* align cursor of sort key */ if (cbkeyalign(cbp, sf) == -1) { fprintf(stderr, "*** Error %d aligning key.\n", errno); bexit(EXIT_FAILURE); } } if (found == 1) { /* advance key (and rec) cursor 1 position */ cbkeynext(cbp, sf); } if (cbrcursor(cbp) == NULL) { printf("End of deck.\n\n"); rdlock(cbp, CB_UNLCK); continue; } cbgetr(cbp, &rd); rdlock(cbp, CB_UNLCK); break; /* case RD_REQ_NEXT_CARD: */ Figure 4.4. Next Record Code 4.5 Closing a cbase When a program is through accessing a cbase, the cbase should be closed. Figure 4.5 shows this code from rolodeck.c. /* close cbase */ if (cbclose(cbp) == -1) { fprintf(stderr, "*** Error %d closing rolodeck.\n", errno); bexit(EXIT_FAILURE); } Figure 4.5. Closing a cbase A cbase is automatically unlocked when it is closed. A cbase is normally opened and closed only once in a program. 4.6 Manipulating Exported Data Exported data often requires some processing before it is used. For instance, consider modifying an application to add a new field to an existing cbase. If there is a considerable amount of data already stored, it would be desirable to use the import/export functions rather than manually entering the data again. Another common example is moving data between cbase and another database package, which may use different field and record separators. An ideal tool for processing records in text files is awk. Below is an example awk program for inserting a new field at position two in all the records in a text file (note that awk field numbering starts at one, not zero). BEGIN { FS = "|"; # set input and output field and record separators OFS = FS; RS = "\n"; ORS = RS; NEWFIELD = 2; # field to insert } # insfld: insert field n of current record function insfld(n) { if ((n < 1) || (n > NF + 1)) { return -1; } for (i = NF; i >= n; i--) { $(i + 1) = $i; } $n = ""; return 0; } { if (insfld(NEWFIELD) == -1) { printf "Error inserting new second field.\n"; exit 1; } print $0; } END{ exit 0; } Figure 4.6. awk Program to Insert Field The predefined variables FS and OFS are used to set the input and output field separators, respectively. The predefined variables RS and ORS are used to set the input and output record separators, respectively. Setting these variables appropriately is all that is necessary to convert between text file formats using different field and record separators. The following short program would convert a text file exported from a database using the tab character as a field separator for import by cbase. BEGIN { FS = "\t"; # set input and output field and record separators OFS = "|"; RS = "\n"; ORS = RS; } { print $0 } END{ exit 0; } Figure 4.7. awk Program to Change FS Appendix A: Installation Instructions The cbase library is distributed in MS-DOS format on either two 360K 5¼" diskettes or one 720K 3½" diskette. On the distribution diskettes is a directory for each component library containing the files for that library. There is also a directory for the manx utility used to extract the reference manual from the source code, and a directory for the example program. If cbase is being installed on an MS-DOS system, the following two commands will copy the contents of the distribution diskettes onto the hard drive. > mkdir cbase > xcopy /s /v a: cbase An operating system besides MS-DOS will require either a facility to read MS-DOS diskettes, or access to an MS-DOS machine from which files can be transferred by a serial link or network to the target machine. If the transfer process does not automatically convert the text files to the format of the target system, an additional conversion utility may be necessary. The remaining installation instructions for each library, which should be performed in the directory containing that library, are given below. The batch files provided for MS-DOS installation (install.bat) are written for Borland Turbo C; because there is so little uniformity among C compilers for this operating system, modifications will be required for other compilers. Instructions for doing this are given at the beginning of the blkio installation batch file. The necessary modifications are straightforward and the same for the other libraries. If a make utility is available, the UNIX makefiles may instead be adapted. Before proceeding to install the libraries, the manx utility should be compiled and placed in a directory in the path. manx is used to generate the reference manual for each library. After installing the libraries, the final step is to compile the example program. A1. The blkio Library UNIX 1. Install the boolean header file. $ su # cp bool.h /usr/include # ^d 2. Build the blkio library. Examine the makerec file afterward for warnings. $ make blkio > makerec 3. Install the blkio library. This will copy file blkio.h to /usr/include and the blkio library archive to /usr/lib. $ su # make install # ^d MS-DOS 1. If necessary, modify install.bat for the C compiler being used. 2. Install the blkio library. > install A2. The btree Library UNIX 1. Install the blkio library. 2. Build the btree library. Examine the makerec file afterward for warnings or errors. $ make btree > makerec 3. Install the btree library. This will copy btree.h to /usr/include and the btree library archive to /usr/lib/libbtree.a. $ su # make install # ^d MS-DOS 1. Install the blkio library. 2. If necessary, modify install.bat for the C compiler being used. 3. Install the btree library. > install A3. The cbase library UNIX 1. Install the btree and lseq libraries. 2. Build the cbase library. Examine the makerec file afterward for warnings or errors. $ make cbase > makerec 3. Install the cbase library. This will copy cbase.h to /usr/include and the cbase library archive to /usr/lib/libcbase.a. $ su # make install # ^d MS-DOS 1. Install the btree and lseq libraries. 2. If necessary, modify install.bat for the C compiler being used. 3. Install the cbase library. > install A4. The lseq Library UNIX 1. Install the blkio library. 2. Build the lseq library. Examine the makerec file afterward for warnings or errors. $ make lseq > makerec 3. Install the lseq library. This will copy lseq.h to /usr/include and the lseq library archive to /usr/lib/liblseq.a. $ su # make install # ^d MS-DOS 1. Install the blkio library. 2. If necessary, modify install.bat for the C compiler being used. 3. Install the lseq library. > install Appendix B: Defining New Data Types cbase allows custom data types to be defined by the user. Custom data types are completely integrated; they are installed in exactly the same manner as the predefined data types. A data type definition consists of a macro used as the type name (e.g., t_string), and three functions: a comparison function, an export function, and an import function. The comparison function is the most important; it determines the sort order for data of that type. The export function is used to export data of the associated type to a text file, and the import function to import data. These functions are supplied by the user when a new data type is installed, but afterwards are used only internally by cbase. Below are given step-by-step instructions for defining a new cbase data type. B1. The Type Name For each cbase data type there is a corresponding type name by which the user refers to that data type. The type names are defined in cbase.h. #define t_char (0) /* signed character */ ... #define t_string (24) /* character string */ #define t_binary (25) /* block of binary data */ Type names must be defined as integers, starting at zero and increasing in steps of one. The type name for a new data type would be added at the end of this list, and be defined as an integer one greater than the last data type in the list. To avoid possible conflict with future predefined types, user defined type names should not start with t_. #define ut_new (27) /* new data type */ B2. The Comparison Function A data type is characterized primarily by its sort order. Each data type is given a comparison function which defines this sort order. Comparison functions are of the form int cmp(const void *p1, const void *p2, size_t n); p1 and p2 are pointers to two data items to be compared, and n is the size of the data items. The value returned must be less than, equal to, or greater than zero if the data item pointed to by p1 is less than, equal to, or greater than, respectively, that pointed to by p2. The C standard library function memcmp would be a valid cbase comparison function. All cbase comparison functions are located in the file cbcmp.c. For a new data type, a comparison function would be added in this file. static int newcmp(const void *p1, const void *p2, size_t n) { ... } Comparison functions are made static because they are accessed only through an array of function pointers, cbcmpv, also defined in cbcmp.c. This array contains the comparison function for each cbase data type. The integer value of the type name is used by cbase as an index into this array, and so the comparison functions must be in the same order as the type names. A pointer to the comparison function for a new data type would be added at the end of this array. /* cbase comparison function table */ cbcmp_t cbcmpv[] = { charcmp, ... pointercmp, binarycmp, newcmp }; B3. The Export and Import Functions Each data type has an associated export function. This export function takes a data item of the associated type and writes it to a file in a text format. Export functions are of the form int exp(FILE *fp, const void *p, size_t n); p is a pointer to the data item of size n to be exported. The export function converts the data item to a text format, then writes it to the current position in file fp. Upon successful completion, a value of zero is returned. Otherwise, a value of -1 is returned. See cbexport in the cbase Programmer's Reference Manual for a description of the format of exported data. All cbase export functions are located in the file cbexp.c. For a new data type, an export function would be added in this file. static int newexp(FILE *fp, const void *p, size_t n) { ... } Just like comparison functions, export functions are accessed by cbase through an array. This array, cbexpv, is defined in cbexp.c. A pointer to the export function for the new data type would be added at the end of this array. The import function reads a data item from a text file. Import functions are of the form int imp(FILE *fp, const void *p, size_t n); The parameters and return value are the same as for the export function. Import functions are located in cbimp.c. Pointers to the import functions are stored in the array cbimpv. B4. The Type Count The macro CBTYPECNT is defined in cbase_.h as the number of data types defined. It must be incremented by one for each new data type added. After completing these steps, rebuild the cbase library (see Appendix A). The underlying libraries do not need to be rebuilt. Appendix C: Porting to a New Operating System The foundation of cbase is the blkio library. It is responsible for performing all file operations. Since the stdio library cannot be used to perform database file operations, it was necessary to use some operating system specific code in writing blkio. blkio must therefore be modified for use under a currently unsupported operating system. The remaining libraries comprising cbase utilize blkio in the same way as stdio is used to allow complete portability, and so to port cbase to a new operating system requires only that blkio be ported to that system. The steps necessary to do this are outlined below. C1. The HOST Macro In the blkio library's private header file blkio_.h, a macro is defined for each supported operating system. When installing the blkio library, the host operating system is selected by defining the HOST macro using one of these system macros. When porting to a new operating system, a new system macro must be defined in blkio_.h. #define UNIX (1) /* UNIX */ #define MSDOS (2) /* MS-DOS */ #define NEWSYS (3) /* new os */ #define HOST NEWSYS In some instances it will be necessary to take into account variations among the C compilers available for a system. For MS-DOS, the macro MSDOSC, also defined in blkio_.h, is used to select a compiler in the same way as the HOST macro is used to select the operating system. To port to a new C compiler under MS-DOS, a new compiler macro must be defined. #define TURBOC (1) /* Turbo C */ #define MSC (2) /* Microsoft C */ #define NEWDOSC (3) /* new MS-DOS C compiler */ #define MSDOSC NEWDOSC To port to multiple incompatible compilers under a new operating system, a new compiler selection macro analogous to MSDOSC would need to be defined. C2. The File Descriptor Type In most operating systems, an open file is accessed not by name, but through some type of handle, usually called a file descriptor. File descriptors are normally of type int, but this is not guaranteed. A union is therefore defined (in cbase.h) for the file descriptor. typedef union { /* file descriptor type */ char cfd; /* character fildes */ short sfd; /* short int fildes */ int ifd; /* int fildes */ long lfd; /* long int fildes */ } fd_t This union type is then used for the file descriptor of the block file control structure. typedef struct { /* block file control structure */ fd_t fd; /* file descriptor */ ... } BLKFILE; When modifying the code in subsequent sections, the appropriate member of the union fd_t would be used to access a file descriptor. If the file descriptor type for the new system is short, for instance, the file descriptor for BLKFILE *bp would be accessed as bp->fd.sfd. It will be necessary to add a member to the fd_t union if one of the required type does not already exist. C3. File Access System Calls The bulk of the operating system specific code is related to the system calls used to access the file system. These system calls perform basic operations such as opening, reading, and writing a file, and are conceptually the same on most systems. In fact, they can usually be directly translated to a corresponding call on the new system. All system calls accessing the file system are isolated in the file buops.c (blkio unbuffered operations). The HOST macro is used to separate sections of code used for different operating systems. Similarly, compiler selection macros such as MSDOSC are used to separate sections of code for different compilers under the same operating system. #if HOST == UNIX . . . #elif HOST == MSDOS #if MSDOSC == TURBOC . . . #elif MSDOSC == MSC . . . #endif #endif When porting to a new operating system (or compiler), each of these conditional compilations using HOST (or the relevant compiler selection macro) must be located and an additional #elif added for the new system (or compiler). C4. File Locking System Calls System calls are also used to perform file locking. All system calls for file locking are located in the file lockb.c. This file must be modified using the same instructions given for buops.c. If file locking will not be used on the new system, lockb.c need not be altered. References CALI82 Calingaert, P. Operating Systems Elements. Englewood Cliffs, NJ: Prentice Hall, 1982. COME79 Comer, D. The Ubiquitous B-tree. ACM Computing Surveys, June 1979. FROS89 Frost, L. A Buffered I/O Library for Structured Files. The C Users Journal, October 1989. HORO76 Horowitz, E. and S. Sahni. Fundamentals of Data Structures. Rockville, MD: Computer Science Press, 1976. KERN88 Kernighan, B. and D. Ritchie. The C Programming Language. Englewood Cliffs, NJ: Prentice Hall, 1988. KNUT68 Knuth D. The Art of Computer Programming Volume 3 / Sorting and Searching. Reading, MA: Addison-Wesley, 1968. ULLM82 Ullman, J. Principles of Database Systems. Rockville, MD: Computer Science Press, 1982.