Il CD di internet

home *** CD-ROM | disk | FTP | other *** search

/ Il CD di internet / CD.iso / SOURCE / CONTRIB / MBASE / MBASE51.TAR / mbase51 / tech / chapter.7 < prev next >

Wrap

Text File | 1993-09-04 | 5.5 KB | 123 lines

Tech Chapter 7 - Multi-length fields MetalBase 5.1 ------------------------------------------------------------------------------- MetalBase 5.1 is the first release to support multi-length fields. The primary reason for delaying the introduction of such a useful ability is that it was difficult to convince myself that there is no better way to work with field data than in temporary files. Hell, I may still be wrong; but it's in now. /* * THE GENERAL IDEA ----------------------------------------------------------- * */ If you've read the chapter on relation format, you know that the inside of the .REL files is very structured; there's a header followed by any number of records, each exactly the same length. So there's the first real problem-- if the fields in a relation don't take the same amount of space, then the records won't be the same length. Bad. That's solved by moving the actual field data off to a separate file; MB uses a .DAT extension for these. The .DAT file is created by mb_create() if the design contains multi-length fields, and isn't created otherwise. Inside the .REL, each record has an eight-byte structure wherever a multi- length field should be... there are two pieces of information stored in that eight-byte block: dataptr pos.....pointer to field data within .DAT file char *name....file name for data transfer While it's stored, the only relevant information in the .REL for multi-length fields is the 'pos' field. The other one is used because the same 8-byte structure is returned to the user, and has to contain everything needed to keep track of multi-length fields. The 'name' field is maintained as a character pointer so we won't have a 32- or 64-byte array of wasted space in the .REL files.. instead, MB will malloc() space for the names when it needs 'em. So before you can look up data, you pass a record (or part of it) to recInit(), to assign temporary files to each multi-length field. Those files are then actually created, so later on you'll have to use a cleanup function, recFree(), to delete them, or you'll have billions of temp files lying around everywhere. Then, when you retrieve a record, the data is copied into the temporary file that was chosen for it. If you instead add or update records, data is read from those temporary files and stored in a convenient place in the .DAT file. /* * THREADED-HEAP FORMAT, FREE SPACE ------------------------------------------- * */ The problem is finding that "convenient place". In order to ensure the fastest access to chunks of free space, MetalBase only maintains a single chain through the heap--each free-space block has a header indicating its size, and a pointer to the next free-space block. Maintaining this thing is a real bitch, let me tell you. But it's quick, so it's probably worth it. When searching for a place to put data, MetalBase uses a first-fit strategy to keep speed up. In trials, using best-fit doesn't really affect the amount of fragmentation... it turns out that, by reducing the size of the left-over free space chunk, the amount of small, unusable blocks increases. Besides, it's a bit more work--which means more chance for errors. Adding data to the heap is easy--the data is placed in the first half of the free-block, and a new link in the free-space chain is created at the end of the data, to take the place of the one we wrote over. If there's not enough contiguous free space within the chain, the data is appended to the end of the file. The trouble comes about when we have to delete a chunk of data. There are four scenarios which have to be dealt with separately (' ' == a free block, '=' == a used block, 'X' == the block we're deleting): [ ] -- In this scenario, the first free block's size is simply [XXXXX] increased, to encompass the block we're deleting. This is [=====] the easiest case. [=====] -- In this scenario, a new link in the free-space chain is [XXXXX] created to encompass the datablock we're deleting. [=====] [=====] -- In this scenario, the trailing free block is expanded and [XXXXX] moved backwards, to encompass both the existing block and [ ] the block we're deleting. [ ] -- In this scenario, the first free block's size is increased [XXXXX] to encompass the second free block, as well as the size of [ ] the block we're deleting. As I said, it's a real bitch... there should be a simpler way. But, it will do. /* * THREADED-HEAP FORMAT, USED SPACE ------------------------------------------- * */ Segments within the .DAT file which contain data are in the following format: pos.....4 bytes: overall size of this used-block. This section is the location pointed to by the .REL file; it is not used during queries, but is used when returning the block to the free-space chain sig.....1 byte: signature; '+' indicates data follows, '-' indicates an end-of-chain marker. size....4 bytes: amount of data required to read entire upcoming page. page....(size): actual data; see the chapter on data compression for format. Each page will be at most 1k or so. After each 'page', the stream continues with 'sig' until '-' is reached, indicating the end of data.