home *** CD-ROM | disk | FTP | other *** search
- Tech Chapter 7 - Multi-length fields MetalBase 5.1
- -------------------------------------------------------------------------------
-
-
- MetalBase 5.1 is the first release to support multi-length fields. The primary
- reason for delaying the introduction of such a useful ability is that it was
- difficult to convince myself that there is no better way to work with field
- data than in temporary files. Hell, I may still be wrong; but it's in
- now.
-
-
- /*
- * THE GENERAL IDEA -----------------------------------------------------------
- *
- */
-
- If you've read the chapter on relation format, you know that the inside of the
- .REL files is very structured; there's a header followed by any number of
- records, each exactly the same length. So there's the first real problem--
- if the fields in a relation don't take the same amount of space, then the
- records won't be the same length. Bad.
-
- That's solved by moving the actual field data off to a separate file; MB uses
- a .DAT extension for these. The .DAT file is created by mb_create() if
- the design contains multi-length fields, and isn't created otherwise.
- Inside the .REL, each record has an eight-byte structure wherever a multi-
- length field should be... there are two pieces of information stored in
- that eight-byte block:
-
- dataptr pos.....pointer to field data within .DAT file
- char *name....file name for data transfer
-
- While it's stored, the only relevant information in the .REL for multi-length
- fields is the 'pos' field. The other one is used because the same 8-byte
- structure is returned to the user, and has to contain everything needed to
- keep track of multi-length fields. The 'name' field is maintained as a
- character pointer so we won't have a 32- or 64-byte array of wasted space
- in the .REL files.. instead, MB will malloc() space for the names when it
- needs 'em.
-
- So before you can look up data, you pass a record (or part of it) to recInit(),
- to assign temporary files to each multi-length field. Those files are then
- actually created, so later on you'll have to use a cleanup function,
- recFree(), to delete them, or you'll have billions of temp files lying
- around everywhere.
-
- Then, when you retrieve a record, the data is copied into the temporary file
- that was chosen for it. If you instead add or update records, data is read
- from those temporary files and stored in a convenient place in the .DAT
- file.
-
-
- /*
- * THREADED-HEAP FORMAT, FREE SPACE -------------------------------------------
- *
- */
-
- The problem is finding that "convenient place". In order to ensure the fastest
- access to chunks of free space, MetalBase only maintains a single chain
- through the heap--each free-space block has a header indicating its size,
- and a pointer to the next free-space block. Maintaining this thing is a
- real bitch, let me tell you. But it's quick, so it's probably worth it.
-
- When searching for a place to put data, MetalBase uses a first-fit strategy to
- keep speed up. In trials, using best-fit doesn't really affect the amount
- of fragmentation... it turns out that, by reducing the size of the left-over
- free space chunk, the amount of small, unusable blocks increases. Besides,
- it's a bit more work--which means more chance for errors.
-
- Adding data to the heap is easy--the data is placed in the first half of the
- free-block, and a new link in the free-space chain is created at the end
- of the data, to take the place of the one we wrote over. If there's not
- enough contiguous free space within the chain, the data is appended to the
- end of the file.
-
- The trouble comes about when we have to delete a chunk of data. There are
- four scenarios which have to be dealt with separately (' ' == a free block,
- '=' == a used block, 'X' == the block we're deleting):
-
- [ ] -- In this scenario, the first free block's size is simply
- [XXXXX] increased, to encompass the block we're deleting. This is
- [=====] the easiest case.
-
- [=====] -- In this scenario, a new link in the free-space chain is
- [XXXXX] created to encompass the datablock we're deleting.
- [=====]
-
- [=====] -- In this scenario, the trailing free block is expanded and
- [XXXXX] moved backwards, to encompass both the existing block and
- [ ] the block we're deleting.
-
- [ ] -- In this scenario, the first free block's size is increased
- [XXXXX] to encompass the second free block, as well as the size of
- [ ] the block we're deleting.
-
- As I said, it's a real bitch... there should be a simpler way. But, it will
- do.
-
-
- /*
- * THREADED-HEAP FORMAT, USED SPACE -------------------------------------------
- *
- */
-
- Segments within the .DAT file which contain data are in the following format:
-
- pos.....4 bytes: overall size of this used-block. This section is the
- location pointed to by the .REL file; it is not used
- during queries, but is used when returning the block
- to the free-space chain
-
- sig.....1 byte: signature; '+' indicates data follows, '-' indicates
- an end-of-chain marker.
-
- size....4 bytes: amount of data required to read entire upcoming page.
-
- page....(size): actual data; see the chapter on data compression for
- format. Each page will be at most 1k or so.
-
- After each 'page', the stream continues with 'sig' until '-' is reached,
- indicating the end of data.
-
-