home *** CD-ROM | disk | FTP | other *** search
Text File | 1990-06-21 | 56.1 KB | 1,538 lines |
- cbaseTM
-
- The C Database Library
-
-
-
-
-
- Citadel
- Brookville, IndianaCopyright 1989 by Citadel. All rights reserved.
-
- Citadel
- 241 East Eleventh Street
- Brookville, IN 47012
- 317-647-4720
- BBS 317-647-2403
-
- Version 1.0.1
-
- This manual is protected by United States copyright law. No part of it
- may be reproduced without the express written permission of Citadel.
-
- Technical Support
- The Citadel BBS is available 24 hours a day. Voice support is
- available between 10 a.m. and 4 p.m. EST. When calling for technical
- support, please have ready the following information:
-
- - product name and version number
- - operating system and version number
- - C compiler and version number
- - computer brand and model
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- UNIX is a trademark of AT&T. MS-DOS is a trademark of Microsoft
- Corporation. Turbo C is a trademark of Borland International, Inc.
-
-
-
-
- Contents
-
-
- Chapter 1. Introduction 1
-
- Chapter 2. Database Basics 3
- 2.1 Sequential File Structures
- 2.2 Inverted Files
- 2.3 B-trees
-
- Chapter 3. cbase Library Functions 7
- 3.1 Access Control Functions
- 3.2 Lock Functions
- 3.3 Record Cursor Position Functions
- 3.4 Key Cursor Position Functions
- 3.5 Input/Output Functions
- 3.6 Data Import/Export Functions
-
- Chapter 4. An Example Program 15
- 4.1 Data Definition
- 4.2 Opening a cbase
- 4.3 Locking a cbase
- 4.4 Accessing a cbase
- 4.5 Closing a cbase
- 4.6 Manipulating Exported Data
-
- Appendix A. Installation Instructions 27
- A1 The blkio Library
- A2 The btree Library
- A3 The lseq Library
- A4 The cbase library
-
- Appendix B. Defining New Data Types 33
- B1 The Type Name
- B2 The Comparison Function
- B3 The Export and Import Functions
- B4 The Type Count
-
- Appendix C. Porting to a New Operating System 37
- C1 The HOST macro
- C2 The File Descriptor Type
- C3 File Access System Calls
- C4 File Locking System Calls
-
- References 41
-
-
-
-
- Chapter 1: Introduction
-
-
- cbase is a C database file management library. Records may be
- accessed both randomly and sequentially through indexes stored in
- B+-trees. Records may also be accessed sequentially in the order in
- which they are stored. Multiuser access is supported under any
- operating system with file locking capabilities.
-
- cbase is designed for portability. It is written in strict adherence
- to the ANSI C standard while retaining compatibility with with K&R C
- compilers. All system dependent code is isolated in order to make
- porting easy.
-
- Many of the operations performed by cbase internally represent
- independently useful tools, and the software has been designed with this
- in mind. cbase actually comprises four individual libraries, each
- complete and independently accessible. Figure 1.1 shows these libraries
- and their relationships.
-
- ┌─────────────────────────────────┐
- │ cbase │
- └───────┬─────────────────┬───────┘
- ┌───────┴───────┐ ┌───────┴───────┐
- │ lseq │ │ btree │
- └───────┬───────┘ └───────┬───────┘
- ┌───────┴─────────────────┴───────┐
- │ blkio │
- └─────────────────────────────────┘
-
- Figure 1.1. cbase and Underlying Libraries
-
- At the foundation of cbase is the blkio (block buffered I/O) library.
- blkio is a buffered I/O library similar to stdio, but based on a file
- model more appropriate for structured files such as used in database
- software. While stdio models a file as an unstructured stream of
- characters, blkio models a file as a collection of blocks made up of
- fields (see FROS89 for a complete description of blkio).
-
- The lseq (linked sequential file) library provides all the facilities
- necessary for the creation and manipulation of doubly linked sequential
- files. The btree (B-tree) library provides the same for B+-tree files.
- The cbase library uses lseq and btree to perform all structured file
- management operations. The lseq library is used for record storage and
- the btree library for inverted file key storage.
-
- When using a particular library, all operations are performed with
- functions provided by that library. No references need be made to
- underlying libraries. When using the cbase library, it is therefore not
- necessary to know the functions included in the other libraries.
-
-
-
-
- Chapter 2: Database Basics
-
-
- This chapter describes some of the basic database concepts relevant
- to cbase. It is intended to be a brief and readily accessible
- introduction, and can be easily skipped by one already familiar with
- database fundamentals. References are given where in-depth discussions
- may be found.
-
-
- 2.1 Sequential File Structures
-
- One of the simplest file organizations is the physical sequential
- file. In this organization, records are simply written one after the other
- and sequential access is done in the order in which the records are
- physically stored.
-
- The physical sequential file works very well in cases where the
- data is static, but it is extremely inefficient when the data is dynamic.
- Consider a file containing 100,000 records stored in a sorted order.
- The insertion of a single record at the beginning of this file would
- result in 100,000 records being moved to make room at the beginning
- of the file for the new record.
-
- This identical problem also occurs when large ordered lists are
- stored in memory, and the same solution, the linked list, can be used
- for files (see pp. 106-12 of HORO76 for an explanation of linked lists
- in memory). In a linked list, the record order is determined by pointers
- stored with each record, not by the physical record locations; a record
- logically located at the beginning of a file can be physically located
- anywhere within the file. In a singly linked list each record has only a
- pointer to the next record and so the file can be traversed in just one
- direction. In a doubly linked list, each record has both a next and a
- previous pointer, allowing bidirectional access.
-
-
- 2.2 Inverted Files
-
- Assume a data file containing member records for an organization,
- and that the record format for this file is
-
- typedef struct {
- long number;
- char name[24];
- char address[81];
- char city[24];
- char state[3];
- char zip[11];
- } member_t;
-
- Assume further that the data file has a physical sequential file structure,
- and that the records are sorted by the number field, therefore a binary
- search may be performed to quickly find a record with a given number
- field. The field that the records are sorted on is called the primary
- index. The problem arises when a query is made on another field
- besides the primary index, in which case a search of the entire data file
- is required.
-
- The problem of being able to perform queries efficiently on more
- than one field can be resolved through the use of inverted files. For
- each secondary index exists an inverted file containing an entry for each
- record in the data file. The inverted file for the name field, for
- example, would have entries containing the name field and record
- position, and these would be sorted by the name field. To find a
- record with a given value in the name field, the inverted file for the
- name field would be searched. The record position paired with the
- specified name value would then be used to locate the record in the
- data file. See pp. 531-533 of HORO76 and pp. 75-78 of ULLM82 for
- more information on inverted files.
-
-
- 2.3 B-trees
-
- As discussed above, the physical sequential file provides
- unacceptable performance for dynamic data. For static data, however, it
- provides efficient sequential and passable random access (using a binary
- search). The linked sequential file solves the problem of performance
- for dynamic data while preserving efficient sequential access, but the
- ability to randomly access records is lost. The need to store dynamic
- data which may be randomly accessed led to the development of the
- B-tree.
-
- As the name implies, the B-tree stores data in a tree structure,
- which allows random access. And since the tree nodes are connected
- by pointers, the update problem is solved in the same way as by the
- linked sequential file organization. The basic B-tree does not provide
- very efficient sequential access, however. This led to a B-tree variant
- called the B+-tree. The B+-tree incorporates a linked list into its
- structure to achieve efficient sequential as well as random access.
-
- While the B+-tree does come close to providing the best of all
- worlds, it has two important drawbacks. First, there is significantly
- more storage overhead for a B+-tree than for a sequential file. Every
- entry in a B+-tree is stored in a leaf node, and the rest of the tree is
- simply scaffolding used when the tree is searched. Second, each entry
- in any type of B-tree must be unique. These characteristics make
- B-tree file organizations inappropriate for data files containing records,
- which may be both large and duplicated. But for inverted files,
- generally having relatively short entries that are by their very nature
- unique (because the file position of each record in the data file would
- be unique), the B+-tree is an ideal choice. cbase therefore uses the
- linked sequential file organization for record storage and B+-trees for
- inverted files.
-
- The mechanics of B-trees is beyond the scope of this chapter. A
- good discussion of the B-tree and its major variants may be found in
- COME79.
-
-
-
-
- Chapter 3: cbase Library Functions
-
-
- 3.1 Access Control Functions
-
- The cbcreate function is used to create a new cbase.
-
- int cbcreate(const char *cbname, size_t recsize,
- int fldc, const cbfield_t fldv[]);
-
- cbname points to a character string which is the name of the cbase.
- This name is used as the name of the data file containing the records in
- the cbase. recsize specifies the record size to be used. fldc is the
- number of fields in the cbase, and fldv is an array of fldc field
- definition structures. Each structure in the array contains the definition
- for one field. field definitions must be in the order the fields occur in
- the record. The field definition structure type cbfield_t is defined
- in <cbase.h>.
-
- typedef struct { /* field definition */
- size_t offset; /* field offset */
- size_t size; /* size of field */
- int type; /* type of field */
- int flags; /* flags */
- char filename[FILENAME_MAX + 1];
- /* data file name */
- } cbfield_t;
-
- offset is the location of the field within the record and size is the
- size of the field. type specifies the field data type, legal values for
- which are shown in Table 3.1. flags values are constructed by
- bitwise ORing together flags from the following list.
-
- CB_FKEY Field is to be a key.
- CB_FUNIQ Only for use with CB_FKEY.
- Indicates that the key is
- constrained to be unique.
-
- FILENAME_MAX is an ANSI C definition indicating the maximum
- length of a filename string. It is defined in <stdio.h>.
-
- t_char signed character
- t_charv signed character array
- t_uchar unsigned character
- t_ucharv unsigned character array
- t_short signed short integer
- t_shortv signed short integer array
- t_ushort unsigned short integer
- t_ushortv unsigned short integer array
- t_int signed integer
- t_intv signed integer array
- t_uint unsigned integer
- t_uintv unsigned integer array
- t_long signed long integer
- t_longv signed long integer array
- t_ulong unsigned long integer
- t_ulongv unsigned long integer array
- t_float floating point
- t_floatv floating point array
- t_double double precision
- t_doublev double precision array
- t_ldouble long double
- t_ldoublev long double array
- t_pointer pointer
- t_string character string
- t_cistring case-insensitive character string
- t_binary block of binary data (e.g., graphics)
-
- Table 3.1. cbase Data Types
-
- The first step in accessing an existing cbase is to open it, which is
- done using the function
-
- cbase_t *cbopen(const char *cbname, const char
- *type, int fldc, const cbfield_t fldv[]);
-
- cbname, fldc, and fldv are the same as for cbcreate, and must
- be given the same values as when the cbase was created. type points
- to a character string specifying the type of access for which the cbase
- is to be opened (as for the stdio function fopen). Legal values for
- type are
-
- "r" open for reading
- "r+" open for update (reading and writing)
-
- cbopen returns a pointer to the open cbase.
-
- The cbsync function causes any buffered data for a cbase to be
- written out.
-
- int cbsync(cbase_t *cbp);
-
- The cbase remains open and the buffers retain their contents.
-
- After processing is completed on an open cbase, it must be closed
- using the function
-
- int cbclose(cbase_t *cbp);
-
- The cbclose function causes any buffered data for the cbase to be
- written out, unlocks it, closes it, and frees the cbase pointer.
-
-
- 3.2 Lock Functions
-
- Before an open cbase can be accessed, it must be locked. The
- function used to control the lock status of a cbase is
-
- int cblock(cbase_t *cbp, int ltype);
-
- where cbp is a pointer to an open cbase and ltype is the lock type
- to be placed on the cbase. The legal values for ltype are
-
- CB_RDLCK lock cbase for reading
- CB_WRLCK lock cbase for reading and writing
- CB_RDLKW lock cbase for reading (wait)
- CB_WRLKW lock cbase for reading and writing (wait)
- CB_UNLCK unlock cbase
-
- If ltype is CB_RDLCK and the cbase is currently write locked by
- another process, or if ltype is CB_WRLCK and the cbase is currently
- read or write locked by another process, cblock will fail and set
- errno to EAGAIN. For the wait lock types, cblock will not return
- until the lock is available.
-
- The cbgetlck function reports the lock status held by the calling
- process on a cbase.
-
- int cbgetlck(cbase_t *cbp);
-
- It returns one of the legal values for the ltype argument in the
- cblock function.
-
-
- 3.3 Record Cursor Position Functions
-
- Each open cbase has a record cursor. At any given time the
- record cursor is positioned either on a record in that cbase or on a
- special position called null. The record on which the cursor is located
- is referred to as the current record. The operations performed by most
- cbase functions are either on or relative to the current record, so the
- initial step in a transaction on a cbase is usually to position the record
- cursor on the desired record. When accessing the records in a cbase in
- the order that they are stored, the following functions are used to move
- the record cursor.
-
- int cbrecfirst(cbase_t *cbp);
- int cbreclast(cbase_t *cbp);
- int cbrecnext(cbase_t *cbp);
- int cbrecprev(cbase_t *cbp);
-
- The cbrecfirst function positions the record cursor to the first
- record, and cbreclast to the last record. Before calling either of
- these functions cbreccnt should be used to test if the cbase is empty.
-
- unsigned long cbreccnt(cbase_t *cbp);
-
- If the cbase is empty, there is no first or last record and so these
- functions would return an error. The cbrecnext function advances
- the record cursor to the succeeding record, and cbrecprev retreats it
- to the preceding record. In the record ordering, null is located before
- the first record and after the last.
-
- There are also functions for saving the current position of the
- record cursor and resetting it to that position.
-
- int cbgetrcur(cbase_t *cbp, cbrpos_t *cbrposp);
- int cbsetrcur(cbase_t *cbp, const
- cbrpos_t*cbrposp);
-
- The cbgetrcur function gets the current position of the record cursor
- and saves it in the variable pointed to by cbrposp. cbrpos_t is the
- cbase record position type, defined in <cbase.h>. cbsetrcur can
- then be used later to set the record cursor back to that position. The
- record cursor can be positioned on null by passing cbsetrcur the
- NULL pointer rather than a pointer to a variable. Other than this
- special case, cbsetrcur should only be called with record cursor
- positions previously saved with cbgetrcur.
-
- The cbrcursor macro is used to test if the record cursor for a
- cbase is positioned on a record or on null.
-
- void *cbrcursor(cbase_t *cbp);
-
- If the record cursor of the cbase pointed to by cbp is positioned on
- null, cbrcursor returns the NULL pointer. If it is on a record,
- cbrcursor returns a value not equal to the NULL pointer. This
- function is useful for loops needing to test when the last (or first)
- record has been reached.
- The cbrecalign function aligns the record cursor with a
- specified key cursor.
-
- int cbrecalign(cbase_t *cbp, int field);
-
- field is the key with which to align the record cursor. The
- relationship between the key cursors and the record cursor is explained
- in the next section.
-
-
- 3.4 Key Cursor Position Functions
-
- In addition to a record cursor, each open cbase also has a key
- cursor for each key defined for that cbase. Like the record cursor, a
- key cursor is positioned either on a record in that cbase or on null. To
- access a cbase in the sort order of a certain key, the appropriate key
- cursor is used instead of the record cursor. Each key cursor moves
- independently of the others, but whenever a key cursor position is set,
- the record cursor is moved to the same record. The key cursors are
- not affected by moving the record cursor.
-
- The following functions are used to move a key cursor.
-
- int cbkeyfirst(cbase_t *cbp, int field);
- int cbkeylast(cbase_t *cbp, int field);
- int cbkeynext(cbase_t *cbp, int field);
- int cbkeyprev(cbase_t *cbp, int field);
-
- These perform as do the corresponding functions for the record cursor.
- Note that the key cursor functions can be used only with fields defined
- to be keys (see cbcreate in section 3.1).
-
- The following function is used to search for a key of a certain
- value.
-
- int cbkeysrch(cbase_t *cbp, int field,
- const void *buf);
-
- field is the key to search for the data item pointed to by buf. If
- the key is found, 1 is returned and the key and record cursors
- positioned to the record having that key. If there is no record with that
- key, 0 is returned and the key and record cursor positioned to the
- record (possibly null) that would follow a record with that key value.
-
- Since the key cursors do not automatically follow the record
- cursor, the situation sometimes occurs where the record cursor is
- positioned to the desired record, but the cursor for the key to be used
- next is not. The cbkeyalign function is used to align a specified
- key cursor with the record cursor.
-
- int cbkeyalign(cbase_t *cbp, int field);
-
- The reason the key cursors are not updated every time the record cursor
- moves is not because it would be in any way difficult to do so, but
- because this would increase the overhead enormously. And since only
- one key cursor is normally used at a time, this extra overhead would
- almost never provide any benefit in return.
-
- As for the record cursor, each key cursor position can be tested to
- be positioned on a record or on null.
-
- void *cbkcursor(cbase_t *cbp, int field);
-
- If the key cursor specified by field of the cbase pointed to by cbp is
- positioned on null, cbkcursor returns the NULL pointer. If it is on a
- record, cbkcursor returns a value not equal to the NULL pointer.
-
-
- 3.5 Input/Output Functions
-
- To read a record from a cbase, the record cursor for that cbase is
- first positioned to the desired record using either the record cursor
- position functions or the key cursor position functions. One of the
- following functions is then called to read from the current record.
-
- int cbgetr(cbase_t *cbp, void *buf);
- int cbgetrf(cbase_t *cbp, int field, void *buf);
-
- cbp is a pointer to an open cbase and buf points to the storage area
- to receive the data read from the cbase. The cbgetr function reads
- the entire current record, while cbgetrf reads the specified field
- from the current record.
-
- The function for inserting a new record into a cbase is
-
- int cbinsert(cbase_t *cbp, const void *buf);
-
- where buf points to the record to be inserted. When a new record is
- inserted into a cbase, the position it holds relative to each key cursor is
- defined by the sort order for that key field. There is no predefined sort
- order associated with the record cursor, however, and it is up to the
- user whether or not to store the records for each cbase in a sorted or
- unsorted order. To store records in a sorted order, the record cursor is
- first positioned the the record after which to insert the new record.
- cbinsert is then called to insert the record pointed to by buf after
- the current record. If no sort order is desired, the step to position the
- record cursor is skipped.
-
- The cbdelcur function is used to delete a record.
-
- int cbdelcur(cbase_t *cbp);
-
- The record cursor must first be positioned on the record to delete, then
- cbdelcur called to delete the current record.
-
- The cbputr function writes over an existing record.
-
- int cbputr(cbase_t *cbp, const void *buf);
-
- buf points to the new record contents. Writing over an existing record
- is equivalent to deleting the record and inserting a new one in the same
- position in the file. If the new record contains an illegal duplicate key,
- this will cause the insert to fail, resulting in the record having been
- deleted from the cbase. The exact behaviour that a program should
- have in such a circumstance is different for different applicatons, and so
- it is often desirable to use cbdelcur and cbinsert directly rather
- than cbputr.
-
-
- 3.6 Data Import/Export Functions
-
- cbase data can be exported to a text file using the cbexport
- function.
-
- int cbexport(cbase_t *cbp, const char *filename);
-
- Every record in cbase cbp is converted to a text format and written to
- the file filename. The export file format is defined as follows.
-
- - Each record is terminated by a newline ('\n').
- - The fields in a record are delimited by vertical
- bars ('|').
- - Each field contains only printable characters.
- - If a field contains the field delimiter
- character, that character is replaced with \F.
- - The individual elements of array data types are
- exported as individual fields.
-
- Data may be imported from a text file using the cbimport
- function.
-
- int cbimport(cbase_t *cbp, const char *filename);
-
- cbimport reads each record from the text file filename and inserts
- it into the cbase cbp.
-
- Data import/export is primarily used to move data between
- different database formats. This sometimes requires some slight
- rearranging of the text before importing. This can be most easily done
- using awk, a language designed specifically for manipulating records
- stored in text files. awk is normally included with UNIX, and is also
- available for MS-DOS.
-
-
-
- Chapter 4: An Example Program
-
-
- Included with cbase is rolodeck, an example program illustrating
- the use of cbase. Rolodeck is a program for storing business cards.
- To allow it to be compiled without requiring any additional libraries for
- displays, and because the purpose of the program is purely instructional,
- the program has been given only a simple user interface.
-
-
- 4.1 Data Definition
-
- The first step in writing a program using cbase is to define the
- data to be stored. This should be done in a separate header file for
- each record type to be used. Figure 4.1 shows rolodeck.h, the data
- definition header for the business card record type used by the rolodeck
- program.
-
- Starting at the top, the first thing to note is the definition of a
- macro to prevent multiple includes. This is a general practice
- applicable to any header file, cbase or otherwise, whose purpose is to
- allow a header to be specified for inclusion multiple times in a file
- while ensuring that the definitions within the file are processed only
- once. For a header containing nothing but macro definitions, the only
- effect of preventing multiple includes will be reduced compile times.
- But for a header containing, for instance, type definitions, multiple
- includes would produce compilation errors. Notice how the include
- macro is related to the header file name; the file name is converted to
- upper case and the period replaced by an underscore ('.' is not a valid
- character for C identifiers). These macros should be named consistently
- to minimize the possibility of accidentally defining the same macro
- elsewhere for some other purpose.
-
- At the beginning of every data definition header file, <cbase.h>
- should be included. Following this is defined the name of the cbase.
- This character string is the name used for the data file. This macro is
- to be used for the cbname argument when calling cbcreate and
- cbopen.#ifndef ROLODECK_H /* prevent multiple includes */
- #define ROLODECK_H
-
- #include <cbase.h>
-
- /* cbase name */
- #define ROLODECK ("rolodeck.dat")
-
- /* record definition */
- typedef struct {
- char rd_contact[41]; /* contact name */
- char rd_title[41]; /* contact title */
- char rd_company[41]; /* company name */
- char rd_addr[2*40+1]; /* company address */
- char rd_city[26]; /* city */
- char rd_state[3]; /* state */
- char rd_zip[11]; /* zip code */
- char rd_phone[13]; /* phone number */
- char rd_ext[5]; /* phone extension */
- char rd_fax[13]; /* fax number */
- char rd_notes[4*40+1]; /* notes */
- } rolodeck_t;
-
- /* field names */
- #define RD_CONTACT (0)
- #define RD_TITLE (1)
- #define RD_COMPANY (2)
- #define RD_ADDR (3)
- #define RD_CITY (4)
- #define RD_STATE (5)
- #define RD_ZIP (6)
- #define RD_PHONE (7)
- #define RD_EXT (8)
- #define RD_FAX (9)
- #define RD_NOTES (10)
- #define RDFLDC (11)
-
- /* field definition list */
- extern const cbfield_t rdfldv[RDFLDC];
-
- #endif /* #ifndef ROLODECK_H */
-
- Figure 4.1. rolodeck.h
-
-
- #ifndef ROLODECK_I /* prevent multiple includes */
- #define ROLODECK_I
-
- #include <cbase.h>
- #include <stddef.h>
- #include "rolodeck.h"
-
- /* field definition list */
- static cbfield_t rd_fldv[] = {
- {offsetof(rolodeck_t, rd_contact),
- sizeofm(rolodeck_t, rd_contact),
- t_string, CB_FKEY | CB_FUNIQ, "rdcont.ndx"},
- {offsetof(rolodeck_t, rd_title),
- sizeofm(rolodeck_t, rd_title),
- t_string, 0, ""},
- {offsetof(rolodeck_t, rd_company),
- sizeofm(rolodeck_t, rd_company),
- t_string, CB_FKEY, "rdcomp.ndx"},
- {offsetof(rolodeck_t, rd_addr),
- sizeofm(rolodeck_t, rd_addr),
- t_string, 0, ""},
- {offsetof(rolodeck_t, rd_city),
- sizeofm(rolodeck_t, rd_city),
- t_string, 0, ""},
- {offsetof(rolodeck_t, rd_state),
- sizeofm(rolodeck_t, rd_state),
- t_string, 0, ""},
- {offsetof(rolodeck_t, rd_zip),
- sizeofm(rolodeck_t, rd_zip),
- t_string, 0, ""},
- {offsetof(rolodeck_t, rd_phone),
- sizeofm(rolodeck_t, rd_phone),
- t_string, 0, ""},
- {offsetof(rolodeck_t, rd_ext),
- sizeofm(rolodeck_t, rd_ext),
- t_string, 0, ""},
- {offsetof(rolodeck_t, rd_fax),
- sizeofm(rolodeck_t, rd_fax),
- t_string, 0, ""},
- {offsetof(rolodeck_t, rd_notes),
- sizeofm(rolodeck_t, rd_notes),
- t_string, 0, ""}
- };
-
- #endif /* #ifdef ROLODECK_I */
-
- Figure 4.2. rolodeck.i
-
- Next is the type definition for the record being defined. Each
- field in the record (member in the structure) begins with a character
- sequence derived from the cbase name, usually two characters, followed
- by an underscore. This character sequence should be unique across all
- cbase record types used by a given program. This type is for use when
- declaring record variables. Variables of this type should be used for
- functions such as cbgetr requiring a pointer to a record.
-
- Following the record type definition is a list of macros used to
- identify a field when calling cbase functions. The names of these
- macros are constructed by taking the field names from the record type
- definition and converting them to uppercase; this is why the field names
- must be made unique. Each field name macro must be an integer
- indicating the number of the field in the record type definition, starting
- at zero. A macro for the total number of fields is also defined. This
- macro is to be used for the fldc argument when calling cbcreate
- and cbopen. Finally, the field definition array is declared. The
- elements of the field definition list are of type cbfield_t, defined in
- <cbase.h>.
-
- Since more than one source file will in general need to include the
- same header, the actual definition of the field definition array must be
- placed in a separate file. This file is given a .i extension to indicate
- that it contains data or code as opposed to a .h file containing only
- definitions, and it is included by only one source file, normally the one
- containing main. rolodeck.i is shown in figure 4.2.
-
- The field definition array specifies, for each field in the record,
- that field's offset in the record, size, type, some flags, and a filename
- to be used if the field is to be a key. The offsetof macro should
- be used to specify the offset. A macro sizeofm (size of member) is
- defined in <blkio.h> to give the size of a member of a structure.
-
- size_t sizeofm(struct_t, member);
-
- The arguments to sizeofm are the same as for offsetof. The size
- cannot be calculated from the difference between field offsets because
- padding characters may be added between structure members to
- maintain proper alignment (see KERN88). The field definition array is
- to be used as the fldv argument to cbcreate and cbopen.
-
- It should be noted that every record type should normally have at
- least one unique key field. This field is used to uniquely identify
- records. The physical record position is often used for this purpose,
- but, as will be discussed later in this chapter, a unique key is required
- for multiuser applications.
-
- In addition to the data definition header, all programs using cbase
- normally require that the following header files also be included.
-
- #include <blkio.h>
- #include <cbase.h>
- #include <errno.h>
-
- <blkio.h> is the header for the blkio library. It is included to
- provide the definition of the NULL pointer and the declaration of the
- function bexit. bexit is for use in place of exit in any program
- using the blkio library for file access. It writes out any buffered data
- for any open block file then calls exit (see bexit in the blkio
- reference manual). <cbase.h> is the cbase header file containing all
- the constant and type definitions and function declarations for using the
- cbase library. <errno.h> contains the declaration of errno as well
- as definitions of error code values; all cbase functions use errno for
- error reporting.
-
-
- 4.2 Opening a cbase
-
- The first step in accessing an existing cbase is to open it. Figure
- 4.3 shows the code from rolodeck.c to open the rolodeck cbase.
- rolodeck is opened with a type argument of "r+" to allow both
- reading and writing. The other arguments are the cbase name,
- ROLODECK, the field count, RDFLDC, and the field definition list,
- rdfldv, all defined in the data definition header file, rolodeck.h.
- On error cbopen returns the NULL pointer. For this program there is
- only one cbase, but most applications will require more.
-
- If the named cbase does not exist, cbopen will fail and set errno
- to ENOENT. In this example, if the rolodeck cbase does not exist, it is
- created and the program continues as normal. Note that the cbase must
- still be opened after it is created. In some cases a separate program is
- written to create all the cbases required by an application; in this case
- the main program would probably interpret ENOENT as an error.
-
- /* open rolodeck cbase */
- cbp = cbopen(ROLODECK, "r+", RDFLDC, rdfldv);
- if (cbp == NULL) {
- if (errno != ENOENT) {
- bexit(EXIT_FAILURE);
- }
- /* create rolodeck cbase */
- printf("Rolodeck does not exist. Creating...\n");
- if (cbcreate(ROLODECK, sizeof(rolodeck_t), RDFLDC, rdfldv)
- == -1) {
- bexit(EXIT_FAILURE);
- }
- cbp = cbopen(ROLODECK, "r+", RDFLDC, rdfldv);
- if (cbp == NULL) {
- bexit(EXIT_FAILURE);
- }
- }
-
- Figure 4.3. Opening a cbase
-
-
- 4.3 Locking a cbase
-
- Before accessing an open cbase, it must first be locked. If data is
- to be written to the cbase, it must be write locked, otherwise only a
- read lock is required. A cbase can be read locked by more than one
- process at the same time, and read locks are therefore also called
- shared locks. A write lock, on the other hand, is an exclusive lock; a
- write locked cbase can be neither read nor write locked by any other
- process. Write locks are exclusive because, if one process tried to read
- data while it was partially modified by another, the data would probably
- be in an inconsistent state. Processes that will only read data, however,
- can safely do so concurrently.
-
- While a cbase is write locked, other processes needing to access
- that cbase must wait until it is unlocked so that they can in turn lock it
- themselves to complete their processing. While a cbase is read locked,
- only processes needing to write must wait. Using a write lock when a
- read lock would suffice will therefore delay other processes
- unnecessarily. Locks of either type should be held for the shortest time
- possible; a common mistake in writing multiuser applications is to
- pause for use input while holding a lock, causing that lock to be held
- indefinitely.
-
- If an attempt is made to obtain a lock on a cbase, but is blocked
- by a lock held by another process, cblock will fail and set errno to
- EAGAIN. The call to cblock is therefore usually made in a loop with a
- predefined maximum number of tries. It is convenient to place this in
- a function configured for the application being developed. Figure 4.4
- shows this function from rolodeck.c. It may also be suitable in
- some instances to sleep for a short (possibly random) time between
- attempts to lock.
-
- #define LMAXTRIES (20) /* max lock tries */
-
- /* rdlock: rolodeck lock */
- int rdlock(cbase_t *cbp, int ltype)
- {
- for (int i = 0; i < LMAXTRIES; ++i) {
- if (cblock(cbp, ltype) == -1) {
- if (errno == EAGAIN) {
- continue;
- }
- return -1;
- } else {
- errno = 0;
- return 0;
- }
- }
- errno = EAGAIN;
- return -1;
- }
-
- Figure 4.4. Rolodeck Locking Function
-
- There are also two lock types (CB_RDLKW and CB_WRLCKW)
- which, if the requested lock is blocked, will wait until it can be
- obtained. These are not usually used, however, because if the lock
- does not become free in a reasonable time, the process waiting for the
- lock will be hung.
-
- For an applications where there will be only a single process, the
- necessary locks can be set immediately after opening the cbases to be
- accessed and left locked.
-
- One critical concern when locking multiple cbases is the possibility
- of deadlock. Deadlock is an extensive subject, and there are a number
- of ways of dealing with it. Most texts on operating systems (see
- CALI82) and database theory cover the subject in detail.
-
-
- 4.4 Accessing a cbase
-
- The gross structure of the rolodeck program is a case statement
- within a loop. At the start of the loop a user request is read and used
- to select the action performed in the case statement. Each individual
- action performed in the case statement illustrates the use of cbase to
- perform a basic operation, e.g., inserting a record, deleting a record,
- finding the next record, exporting data to a text file, etc. The operation
- of finding the next record serves as a good general example. The code
- for this from rolodeck.c is shown in figure 4.5.
-
- One of the most important points to notice in the example code is
- that a unique key (the contact name, here) is used to relocate the
- current record when a cbase is locked. This is because, when a cbase
- is unlocked, it may be modified by another process. A record at a
- given location may be deleted, and the empty slot possibly reused for a
- new record. Because of this, cbsetrpos cannot be used with a
- record position obtained during a previously held lock. As mentioned
- in the section 4.3, applications that do not involve multiple processes
- accessing the same data can simply lock a cbase and leave it locked
- rather than locking the cbase immediately prior to each transaction.
-
- Another central point is the use of multiple keys. In the rolodeck
- program, both the contact and the company names are keys. A variable
- sf is used in rolodeck.c to identify the current sort field, which
- can be changed interactively. Before using the cbkeynext function,
- the appropriate key cursor must first be positioned. cbkeysrch
- positions only the key being searched, here being the unique key. If
- the next card is to be found using the sort order of a different key,
- cbkeyalign must first be used to align that key cursor with the
- current record.
-
- case REQ_NEXT_CARD: /* next card */
- rdlock(cbp, CB_RDLCK);
- if (cbreccnt(cbp) == 0) {
- printf("The rolodeck is empty.\n\n");
- rdlock(cbp, CB_UNLCK);
- continue;
- }
- /* use unique key field to set rec cursor */
- found=cbkeysrch(cbp,RD_CONTACT,rd.rd_contact);
- if (sf != RD_CONTACT) {
- /* align cursor of sort key */
- cbkeyalign(cbp, sf);
- }
- if (found == 1) {
- /* advance key (and rec) cursor 1 pos */
- cbkeynext(cbp, sf);
- }
- if (cbrcursor(cbp) == NULL) {
- printf("End of deck.\n\n");
- rdlock(cbp, CB_UNLCK);
- continue;
- }
- cbgetr(cbp, &rd);
- rdlock(cbp, CB_UNLCK);
- break; /* case REQ_NEXT_CARD: */
-
- Figure 4.5. Next Rolodeck Record
-
-
- 4.5 Closing a cbase
-
- When a program is through accessing a cbase, the cbase should be
- closed. Figure 4.6 shows this code from rolodeck.c.
-
- /* close cbase */
- if (cbclose(cbp) == -1) {
- bexit(EXIT_FAILURE);
- }
-
- Figure 4.6. Closing a cbase
-
- A cbase is automatically unlocked when it is closed. A cbase is
- normally, but not necessarily, opened and closed only once in a
- program.
-
-
- 4.6 Manipulating Exported Data
-
- Exported data often requires some processing before it is used.
- For instance, consider modifying an application to add a new field to an
- existing cbase. If there is a considerable amount of data already stored,
- it would be desirable to use the import/export functions rather than
- manually entering the data again. Another common example is moving
- data between cbase and another database system, which may use
- different field and record separators for exported data.
-
- An ideal tool for processing records in text files is awk. Figure
- 4.7 shows an awk program for inserting a new field at position two in
- all the records in a text file (note that awk field numbering starts at
- one, not zero).
-
- The predefined variables FS and OFS are used to set the input and
- output field separators, respectively. The predefined variables RS and
- ORS are used to set the input and output record separators, respectively.
- Setting these variables appropriately is all that is necessary to convert
- between text file formats using different field and record separators.
- The awk program in figure 4.8 converts text files exported from a
- database using the tab character as a field separator for import by
- cbase.
-
- BEGIN {
- FS = "|"; # set input and output field
- OFS = FS; # and record separators
- RS = "\n";
- ORS = RS;
- NEWFIELD = 2; # field to insert
- }
-
- # insfld: insert field n of current record
- function insfld(n)
- {
- if (n < 1 || n > NF + 1) {
- return -1;
- }
-
- for (i = NF; i >= n; --i) {
- $(i + 1) = $i;
- }
- $n = "";
-
- return 0;
- }
-
- {
- if (insfld(NEWFIELD) == -1) {
- printf "Error inserting new 2nd fld.\n";
- exit 1;
- }
- print $0;
- }
-
- END {
- exit 0;
- }
-
- Figure 4.7. awk Program to Insert Field
-
-
- BEGIN {
- FS = "\t"; # set input and output field
- OFS = "|"; # and record separators
- RS = "\n";
- ORS = RS;
- }
-
- {
- print $0
- }
-
- END {
- exit 0;
- }
-
- Figure 4.8. awk Program to Change Field Separator
-
-
-
-
- Appendix A: Installation Instructions
-
- The cbase library is distributed in MS-DOS format on either two
- 360K 5.25" diskettes or one 720K 3.5" diskette. On the distribution
- diskettes is a directory containing the files for each library and for
- rolodeck, the example program. There is also a directory for manx, the
- utility used to extract an on-line copy of the reference manual for each
- library, and a directory containing installation batch files for different
- MS-DOS compilers.
-
- blkio blkio library
- btree btree library
- lseq lseq library
- cbase cbase library
- manx manx utility
- rolodeck example program
- bats batch files for different compilers
-
- If cbase is being installed on an MS-DOS system, the following
- two commands will copy the contents of the distribution diskettes onto
- the main drive.
-
- > mkdir cbase
- > xcopy a:\ cbase /s /v
-
- An operating system besides MS-DOS will require either a facility to
- read MS-DOS diskettes or access to an MS-DOS machine from which
- files can be transferred by a serial link or network to the target
- machine. If the transfer process does not automatically convert the text
- files to the format of the target system, an additional conversion utility
- may be necessary.
-
- The second step in the installation is set a switch specifying the
- host operating system. This switch is the HOST macro in the blkio file
- blkio_.h. If the host operating system is MS-DOS, the MSDOSC
- macro in the same file must also be set to specify the C compiler being
- used.
-
- If the compiler being used is not completely ANSI compatible,
- some additional switches must also be set. These are a set of macros
- located in the blkio header file blkio.h, each of which specifies a
- particular ANSI feature. For instance, the macro AC_PROTO is used to
- indicate function prototype support.
-
- The remaining installation instructions for each library, which
- should be performed in the directory containing the files for that library,
- are given below. Before proceeding, the manx utility should be
- compiled and placed in a directory in the path. After all the libraries
- have been installed, the final step is to compile the example program;
- the instructions for doing this are given in the example program's
- readme file.
-
- The MS-DOS installation batch files, install.bat, each take
- two arguments. The first specifies the memory model, legal values for
- which are s, m, c, l, and h; the library file is named libm.lib,
- where lib would be the library name and m would correspond to the
- memory model of the library. The second, if present, causes the
- reference manual to be extracted from the source code into the file
- lib.man, where lib would again be the library name. The main
- batch file included with each library is written for Borland Turbo C.
- Because there is so little uniformity among C compilers for MS-DOS,
- modifications will be required for other compilers. Instructions for
- making these straightforward modifications are given at the beginning of
- each install.bat. Some batch files modified for other compilers
- can be found in the bats directory. Also, if a make utility is
- available, the UNIX makefiles may instead be adapted.
-
-
- A1. The blkio Library
-
- UNIX
-
- 1. Set the HOST macro in blkio_.h to UNIX.
- 2. Install the boolean header file.
- $ su
- # cp bool.h /usr/include
- # ^d
- 3. Extract the on-line reference manual.
- $ make man
- 4. Build the blkio library.
- $ make blkio
- 5. Install the blkio library. This will copy the blkio header
- file blkio.h to /usr/include and the blkio library archive
- to /usr/lib.
- $ su
- # make install
- # ^d
-
-
- MS-DOS
-
- 1. Set the HOST macro in blkio_.h to MS_DOS.
- 2. Set the MSDOSC macro in blkio_.h to the C compiler being
- used.
- 3. If necessary, modify install.bat for the C compiler being
- used.
- 4. Extract the reference manual and build and install the blkio
- library.
- > install s x
- Run again for each additional memory model desired.
-
-
- A2. The btree Library
-
- UNIX
-
- 1. Install the blkio library.
- 2. Extract the on-line reference manual.
- $ make man
- 3. Build the btree library.
- $ make btree
- 4. Install the btree library. This will copy btree.h to
- /usr/include and the btree library archive to
- /usr/lib.
- $ su
- # make install
- # ^d
-
-
- MS-DOS
-
- 1. Install the blkio library.
- 2. If necessary, modify install.bat for the C compiler
- being used.
- 3. Install the btree library.
- > install s x
- Run again for each additional memory model desired.
-
-
- A3. The lseq Library
-
- UNIX
-
- 1. Install the blkio library.
- 2. Extract the on-line reference manual.
- $ make man
- 3. Build the lseq library.
- $ make lseq
- 4. Install the lseq library. This will copy lseq.h to
- /usr/include and the lseq library archive to
- /usr/lib.
- $ su
- # make install
- # ^d
-
-
- MS-DOS
-
- 1. Install the blkio library.
- 2. If necessary, modify install.bat for the C compiler
- being used.
- 3. Install the lseq library.
- > install
- Run again for each additional memory model desired.
-
-
- A4. The cbase library
-
- UNIX
-
- 1. Install the btree and lseq libraries.
- 2. Extract the on-line reference manual.
- $ make man
- 3. Build the cbase library.
- $ make cbase
- 4. Install the cbase library. This will copy cbase.h to
- /usr/include and the cbase library archive to
- /usr/lib.
- $ su
- # make install
- # ^d
-
-
- MS-DOS
-
- 1. Install the btree and lseq libraries.
- 2. If necessary, modify install.bat for the C compiler
- being used.
- 3. Install the cbase library.
- > install s x
- Run again for each additional memory model desired.
-
-
-
-
-
-
-
- Appendix B: Defining New Data Types
-
-
- cbase is designed to allow custom data types to be defined by the
- user. Custom data types are completely integrated, becoming
- indistinguishable from those predefined. A data type definition consists
- of a macro used as the type name (e.g., t_string), and three
- functions: a comparison function, an export function, and an import
- function. The comparison function is the most important; it determines
- the sort order for data of that type. The export function is used to
- export data of the associated type to a text file, and the import function
- to import data. Below are given step-by-step instructions for defining a
- new cbase data type.
-
-
- B1. The Type Name
-
- For each cbase data type there is a corresponding type name by
- which the user refers to that data type. The type names are defined in
- cbase.h.
-
- #define t_char (0) /* signed character */
- ...
- #define t_binary (26) /* binary data */
-
- Type names must be defined as integers, starting at zero and increasing
- in steps of one. The type name for a new data type would be added at
- the end of this list, and be defined as an integer one greater than the
- last data type in the list. To avoid possible conflict with future
- predefined types, user defined type names should not start with t_.
- The prefix ut_ is recommended for user defined types.
-
- #define ut_new (27) /* new data type */
-
-
- B2. The Comparison Function
-
- A data type is characterized primarily by its sort order. Each data
- type is given a comparison function defining this sort order.
- Comparison functions are of the form
-
- int cmp(const void *p1, const void *p2, size_t n);
-
- p1 and p2 are pointers to two data items to be compared, and n is the
- size of the data items. The value returned must be less than, equal to,
- or greater than zero if the data item pointed to by p1 is less than,
- equal to, or greater than, respectively, that pointed to by p2. The C
- standard library function memcmp would be a valid cbase comparison
- function.
-
- All cbase comparison functions are located in the file cbcmp.c.
- For a new data type, a comparison function would be added in this file.
-
- static int newcmp(const void *p1, const void *p2,
- size_t n)
- {
- ...
- }
-
- Comparison functions are made static because they are accessed by
- cbase only through an array of function pointers, cbcmpv, also defined
- in cbcmp.c. This array contains the comparison function for each
- cbase data type. The integer value of the type name is used by cbase
- as an index into this array, and so the comparison functions must be in
- the same order as the type names. A pointer to the comparison
- function for a new data type would be added at the end of this array.
-
- /* cbase comparison function table */
- cbcmp_t cbcmpv[] = {
- charcmp,
- ...
- binarycmp,
- newcmp
- };
-
-
- B3. The Export and Import Functions
-
- Each data type has an associated export function. This export
- function takes a data item of the associated type and writes it to a file
- in a text format. Export functions are of the form
-
- int exp(FILE *fp, const void *p, size_t n);
-
- p is a pointer to the data item of size n to be exported. The export
- function converts the data item to a text format, then writes it to the
- current position in file fp. Upon successful completion, a value of
- zero is returned. Otherwise, a value of -1 is returned. See cbexport
- in the cbase Programmer's Reference Manual for a description of the
- format of exported data.
-
- All cbase export functions are located in the file cbexp.c. For a
- new data type, an export function would be added in this file.
-
- static int newexp(FILE *fp, const void *p, size_t n)
- {
- ...
- }
-
- Just like comparison functions, export functions are accessed by
- cbase through an array. This array, cbexpv, is defined in cbexp.c.
- A pointer to the export function for the new data type would be added
- at the end of this array.
-
- The import function reads a data item from a text file. Import
- functions are of the form
-
- int imp(FILE *fp, const void *p, size_t n);
-
- The parameters and return value are the same as for the export
- function. Import functions are located in cbimp.c. Pointers to the
- import functions are stored in the array cbimpv.
-
-
- B4. The Type Count
-
- The macro CBTYPECNT is defined in cbase_.h as the number
- of data types defined. It must be incremented by one for each new
- data type added.
-
-
- After completing these steps, rebuild the cbase library (see
- Appendix A). The underlying libraries do not need to be rebuilt.
-
-
-
-
- Appendix C: Porting to a New Operating System
-
-
- The blkio library provides a means for portable access to
- structured files just as the stdio library does for text files. blkio is thus
- the only library requiring modification to port to a new operating
- system. The steps necessary to perform this port are outlined below.
-
-
- C1. The HOST Macro
-
- In the blkio library's private header file blkio_.h, a macro is
- defined for each supported operating system. When installing the blkio
- library, the host operating system is selected by defining the HOST
- macro using one of these system macros. When porting to a new
- operating system, a new system macro must be defined in blkio_.h.
-
- #define UNIX (1) /* UNIX */
- #define MS_DOS (2) /* MS-DOS */
- #define NEWOS (3) /* new os */
- #define HOST NEWOS
-
- In some instances it will be necessary to take into account
- variations among the C compilers available for a system. For
- MS-DOS, the macro MSDOSC, also defined in blkio_.h, is used to
- select a compiler in the same way as the HOST macro is used to select
- the operating system. To port to a new C compiler under MS-DOS, a
- new compiler macro must be defined.
-
- #define TURBOC (1) /* Turbo C */
- #define MSC (2) /* Microsoft C */
- #define NEWC (3) /* new C compiler */
- #define MSDOSC NEWC
-
- To port to multiple incompatible compilers under a new operating
- system, a new compiler selection macro analogous to MSDOSC would
- need to be defined.
-
-
- C2. The File Descriptor Type
-
- In most operating systems, an open file is accessed not by name,
- but through some type of handle, usually called a file descriptor. File
- descriptors are normally of type int, but this is not guaranteed. A
- union is therefore defined (in blkio.h) for the file descriptor.
-
- typedef union { /* fildes type */
- char cfd; /* character fildes */
- short sfd; /* short int fildes */
- int ifd; /* int fildes */
- } fd_t;
-
- This type is used to define the fd member of the BLKFILE structure.
-
- typedef struct { /* block file ctl struct */
- fd_t fd; /* file descriptor */
- ...
- } BLKFILE;
-
- When modifying the code in subsequent sections, the appropriate
- member of the union fd_t would be used to access a file descriptor.
- If the file descriptor type for the new system is short, for instance,
- the file descriptor for BLKFILE *bp would be accessed as bp->fd.sfd.
- It will be necessary to add a member to the fd_t union if one of the
- required type does not already exist.
-
-
- C3. File Access System Calls
-
- The bulk of the operating system specific code is related to the
- system calls used to access the file system. These system calls perform
- basic operations such as opening, reading, and writing a file, and are
- conceptually the same on most systems. In fact, they can usually be
- directly translated to a corresponding call on the new system.
-
- All system calls accessing the file system are isolated in the file
- buops.c (blkio unbuffered operations). The HOST macro is used to
- separate sections of code used for different operating systems.
- Similarly, compiler selection macros such as MSDOSC are used to
- separate sections of code for different compilers under the same
- operating system.
-
- #if HOST == UNIX
- /* code for UNIX */
- .
- .
- #elif HOST == MS_DOS
- /* code for MS-DOS */
- #if MSDOSC == TURBOC
- /* code for Turbo C */
- .
- .
- #elif MSDOSC == MSC
- /* code for Microsoft C */
- .
- .
- #endif
- #endif
-
- When porting to a new operating system (or compiler), each of these
- conditional compilations using HOST (or the relevant compiler selection
- macro) must be located and an additional #elif added for the new
- system (or compiler).
-
-
- C4. File Locking System Calls
-
- System calls are also used to perform file locking. All system
- calls for file locking are located in the file lockb.c. This file must
- be modified as was buops.c. If file locking will not be used on the
- new system, lockb.c need not be altered.
-
-
-
-
- References
-
-
- CALI82 Calingaert, P. Operating System Elements. Englewood
- Cliffs, NJ: Prentice Hall, 1982.
-
- COME79 Comer, D. The Ubiquitous B-tree. ACM Computing
- Surveys, June 1979.
-
- FROS89 Frost, L. A Buffered I/O Library for Structured Files.
- The C Users Journal, October 1989.
-
- HORO76 Horowitz, E. and S. Sahni. Fundamentals of Data
- Structures. Rockville, MD: Computer Science Press, 1976.
-
- KERN88 Kernighan, B. and D. Ritchie. The C Programming
- Language. Englewood Cliffs, NJ: Prentice Hall, 1988.
-
- KNUT68 Knuth D. The Art of Computer Programming Volume 3 /
- Sorting and Searching. Reading, MA: Addison-Wesley, 1968.
-
- ULLM82 Ullman, J. Principles of Database Systems. Rockville,
- MD: Computer Science Press, 1982.