home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
OS/2 Shareware BBS: 11 Util
/
11-Util.zip
/
pxtol012.zip
/
doc
/
paradox4.txt
next >
Wrap
Text File
|
2000-09-05
|
57KB
|
1,747 lines
PARADOX 4.x FILE FORMATS
Revision 1
May 11, 1996
Author: Kevin Mitchell
PARADOX 4.x FILE FORMATS
Preface
-------
This document is released to the public domain.
You may do anything you want with it.
IMPORTANT: YOU USE THE INFORMATION CONTAINED IN THIS DOCUMENT
AT YOUR OWN RISK. I CANNOT BE HELD RESPONSIBLE FOR
ANY DAMAGES THAT RESULT FROM YOUR USE OF THIS
INFORMATION.
If you modify the document, please remove my name from it.
There are a couple of illustrations that would have looked better
if I had used line draw characters. I avoided using line draw
characters because non-US character sets do not display the
characters correctly.
Kevin Mitchell
May 11, 1996
Send questions and comments via E-Mail.
My Compuserve Id is 70717,475
I specialize in programming for Paradox/DOS.
Revision 1.0 i May 11, 1996
PARADOX 4.x FILE FORMATS
Introduction
------------
This document describes the internal formats of the Paradox data
and index files:
DB - Contains the data records for a table. The header contains
the field names and types and other useful information.
PX - Contains the primary index for a table. If the table is
unkeyed, there is no PX file for the table.
MB - Contains BLOB (Binary Large Object) data. For example, type
M and B fields are blobs and the MB file contains the data for M
and B fields.
X** - Contains a secondary index and has the same basic format as
a DB file.
Y** - Contains the index for a secondary index (X**) And has the
same basic format as a PX file.
Some of the information in this document may apply to tables
created by other Paradox versions, but the information was
derived by examining version 4.5 tables.
Standard reverse engineering techniques were used to obtain the
information, i.e., view a file with a hex editor, make a change
in Paradox, and then look at the file again to see what changed.
Note: I tried about eight different shareware hex editors. My
favorite was Hex Workshop 2.10 from BreakPoint Software. It
is a Windows application and is available in 16 and 32-bit
versions. It can be found in Lib 4 of the PCUTIL forum. The
filename is HW16V210.ZIP. It's about 300k so it only takes
about 3 minutes to download at 14.4 kbps. I am in no way
affiliated with the authors of Hex Workshop.
Not all of the DB and PX header fields are documented. However,
there is enough information to perform the following tasks.
* Write a program to retrieve a record and its memo fields, if
any.
Write a program to browse the table in forward or reverse *
primary key sequence.
Revision 1.0 1 May 11, 1996
PARADOX 4.x FILE FORMATS
Write a program to use the primary index to locate a record. *
* Write a program to use the secondary index to locate
records.
* If Tutility can't rebuild a table, you might be able to use
a hex editor to make enough repairs so that Tutility can
proceed.
It would be foolhardy to attempt to write a program to update a
table based on the information in this document. It might be
possible to do it but it would be dangerous when you consider the
number of undocumented header fields. Paradox might be quite
sensitive to the content of some of the undocumented fields.
Paradox Field Types and Lengths
-------------------------------
The following table shows the Paradox 4.x field types and the
number of bytes occupied by each type in a data record.
Type Length
---- ------
N 8
$ 8
D 4
S 2
Ann nn 1 <= nn <= 255
Mnn nn+10 1 <= nn <= 240
Bnn nn+10 0 <= nn <= 240
Unn nn+10 0 <= nn <= 240 (Only created during IMPORT)
N and $ fields are stored as double-precision floating point
numbers and are identical except that Paradox displays $ fields
differently. Paradox automatically rounds $ fields to 2 decimal
places, uses separators (commas) between three-digit groups, and
puts parentheses around negative numbers.
IMPORTANT: the rounding for $ fields is for display only. The
stored value is NOT rounded.
D fields are stored as a signed long integer.
Ann fields are stored as fixed-length character strings. Unused
positions on the right are filled with nulls (binary zero).
Revision 1.0 2 May 11, 1996
PARADOX 4.x FILE FORMATS
M, B, and U fields are variable length and are called BLOB
fields. The length shown here is the fixed-length leader that is
stored in the record in the DB file.
A blob uses an extra 10 bytes in the DB record. The extra space
is used to hold the length of the blob, the location of the blob
in the MB file, and a modification number (used internally by
Paradox).
Blob data is stored in the MB file. The leader in the DB record
contains a copy of the first part of the data in MB.
A special case occurs when the entire Blob will fit in the
leader. In this case the blob is stored in the leader and is not
written to the MB file.
In general, memo fields should be defined as M1 to minimize
record size. An M1 field takes 11 bytes in the record. A larger
memo field, Mnn, can improve performance if all of the following
conditions are met.
- Most records have non-blank memos
- Most memos have a length less than or equal to nn
- Most time-critical operations look at the memo fields
Numeric Formats Supported by the 80x86/7
----------------------------------------
A byte can store a value between 0 and 255 if the value is
treated as unsigned. If the byte is treated as a signed value,
the value can range from -128 to 127.
Short integers are 16 bits long (2 bytes). Unsigned range: 0 to
65,535. Signed range: -32768 to 32767.
Long integers are 32 bits long (4 bytes). Unsigned range: 0 to
4,294,967,295. Signed range: -2,147,483,648 to 2,147,483,547.
Double precision floating point numbers are 64 bits long (8
bytes). Floating point numbers are always interpreted as signed
values. They provide approximately 15 decimal digits of precision
with a decimal exponent in the range -307 to 308.
Note: Single precision (32-bits) and extended precision (80-bits)
numbers are also supported but are not used by Paradox.
Single precision is pretty useless because it only provides
for 6 decimal digits of precision. Extended precision is not
Revision 1.0 3 May 11, 1996
PARADOX 4.x FILE FORMATS
used because it is actually intended only for intermediate
results during computations. Internally, the 80x86/7 uses
extended precision for all computations.
Floating Point
--------------
Paradox uses double precision floating point (64 bits) for type N
and $ fields.
This floating point format has a sign bit, 11 bits for the
exponent, and 52 bits for the significand.
In order to avoid having two sign bits (one for the number and
one for the exponent), the 80x86 (or 80x87) uses a bias for the
exponent. The bias is subtracted from the exponent value to
obtain the true exponent. The true exponent is the power of 2
that the significand must be multiplied by.
The largest unsigned number that can be expressed with 11 bits is
2047. The smallest is zero. The 80x86 disallows exponents with
all bits set to either 0 or 1 (there are a few exceptions to this
rule). Thus, the 11-bit exponent can range from 1 to 2046. The
80x86 uses a bias of 1023. This means that numbers with binary
exponents between -1022 and +1023 can be represented. This is a
big enough range for most applications.
The significand is always normalized. This means that the binary
point (binary equivalent of a decimal point) is placed
immediately to the right of the most significant "1" bit in the
number. Since normalization forces all numbers to have a 1 to the
left of the binary point, the 1 is not stored in the field - it
is assumed to be there. Effectively, this increases the
significand length to 53.
Consider the following floating point number (the "h" after the
number denotes hexadecimal notation):
4059200000000000h
The sign bit is zero.
The exponent is 405h = 1029. We subtract the bias to get 6 as the
true binary exponent.
The significand is 92h (We can ignore trailing zeros). In binary
this is: 10010010.
Revision 1.0 4 May 11, 1996
PARADOX 4.x FILE FORMATS
When we put back the assumed 1 and the binary point we get:
1.10010010
We multiply this by 2 raised to the 6th power. This is the same
as shifting the binary point 6 places to the right.
1100100.10
In decimal this is 100 (1100100) + .5 (.10) = 100.5
Note: Binary digits to the right of the binary point correspond
to negative powers of 2. The following table shows some examples.
Binary Decimal
------ ----------
.1 .5 (1/2)
.01 .25 (1/4)
.001 .125 (1/8)
.0001 .0625 (1/16)
For example, decimal .75 is written in binary as .11
A floating point number is treated as zero if its exponent and
significand bits are all zero. This is an exception to the rule
that the exponent cannot be all "0" or all "1".
Certain operations can cause the exponent to be all "0" or all
"1". The special handling that the 80x86/7 uses for these numbers
is beyond the scope of this document.
Date Format
-----------
Paradox stores a date as a long integer. The integer contains the
date expressed as the number of days since January 1, 1 (the year
1 A.D.).
Although dates are expressed (internally) as the number of days
since 1/1/1, the lowest year that Paradox allows you to enter is
100. If a value less than 100 is entered, it is treated as 19xx,
where xx is the value.
The internal representation of 1/1/100 (the lowest valid date) is
36,160 (00008D40h). The internal representation of January 2, 100
Revision 1.0 5 May 11, 1996
PARADOX 4.x FILE FORMATS
is 36,161. The internal representation of May 4, 1996 is decimal
728,783 (000B1ECFh).
Paradox accepts dates between Jan 1, 100 and Dec 31, 9999.
Blob Fields
-----------
Type M (Memo) and B (Binary) fields are blob fields.
In a DB record, a blob field is stored as a fixed-length data
field (called the leader) followed by 10 bytes with the following
fields:
An unsigned long integer (32 bits) that contains the offset *
of the blob's data block in the MB file and an index value.
An unsigned long integer that contains the length of the *
blob.
* An unsigned short integer (16 bits) that contains the
modification number from the MB file header.
The length of the leader may be zero for type B fields. The
leader for a type M field must be at least one byte.
If you define a memo field as M40, then the length of the leader
is 40. If the memo data is over 40 bytes long, then the entire
memo is stored in the MB file and the leader contains a copy of
the first 40 bytes. If the memo data is less than 41 bytes long,
then the leader contains all of the data and nothing is stored in
the MB file.
The MB file is described later in this document.
Although the numeric data in a DB record data is stored in
modified big endian format, the 10 bytes of blob information are
stored in little endian (native 80x86) format.
We'll refer to the first four bytes after the leader as
MB_Offset. MB_Offset is used to locate the blob data.
If MB_Offset = 0 then the entire blob is contained in the leader.
Take the low-order byte from MB_Offset and call it MB_Index.
Change the low-order byte of MB_Offset to zero.
Revision 1.0 6 May 11, 1996
PARADOX 4.x FILE FORMATS
If MB_Index is FFh, then MB_Offset contains the offset of a type
02 block in the MB file.
Otherwise, MB_Offset contains the offset of a type 03 block in
the MB file. MB_Index contains the index of an entry in the Blob
Pointer Array in the type 03 block.
Refer to the MB file description for block formats.
Big and Little Endians
----------------------
80x86 processors normally store numeric fields in "little endian"
format. This means the least significant byte of the number has
the lowest address (the little end comes first).
For example, a short integer containing decimal 10 has the
following hexadecimal representation: 000Ah (the lower case h
after the number means it is a hex number). An 80x86 processor
would store this as 0A00. The least significant byte has the
lowest address.
Many processors (like the Motorola processors used in Macintosh
computers) store numbers is "big endian" format. The most
significant byte has the lowest address (the big end comes
first).
Processors that use big endian format store 000Ah as 000A.
Little endian format causes some complications when you attempt
to sort a number as a string. 256, expressed as a short integer,
is 0100h. This is stored (little endian) as 0001. If you sort a
file that contains 10 and 100, then 10 (0A00) sorts after 256
(0001).
Big endian notation is a partial solution to this problem. 10
(000A) will clearly sort before 256 (0100). Sorts using big
endian notation fail to work correctly when signed numbers are
used. -1 is expressed as FFFFh and will sort higher than any
other number.
If all numbers are signed and all floating point numbers are
normalized, then there is a simple modification to big endian
format that makes sorting work correctly. The sign bit (leftmost
bit) is complemented (reversed) when numbers are stored. (It is
complemented again before the numbers are used in computations).
Revision 1.0 7 May 11, 1996
PARADOX 4.x FILE FORMATS
-2 becomes 7FFE
-1 becomes 7FFF
1 becomes 8001
2 becomes 8002
Note that negative numbers will sort before positive numbers.
"Larger" negative numbers will sort before "smaller" negative
numbers.
Since double-precision floating point numbers on the 80x86/7 are
ALWAYS signed and normalized, they will also sort correctly.
Because it is important, I will repeat that this modification to
big endian format only makes sorting work correctly if all
numbers are signed. This is probably why Paradox doesn't support
unsigned fields.
Paradox uses little endian format (the natural 80x86 format) for
control structures like file headers and block headers. It uses
modified big endian format for numeric data, i.e., type N, $, D,
and S fields in data records.
Revision 1.0 8 May 11, 1996
PARADOX 4.x FILE FORMATS
The DB File
-----------
The DB file contains the data records for a table. The first
block in the DB file is the table header. The table header is
followed by the data blocks.
The DB file has the following logical structure.
+--------------+
| Table Header |
| |
|First Block |-------------------+
|Last Block |----+ |
|Free Blocks |--+ | |
+--------------+ | | |
| | |
+-------------+ | V
V | +-------------+ Next
+------+ Next | +------>|Data block 1 |-------+
| Free |-----+ | | +-------------+ |
+------+ | | | |
| | | +-------------+ |
+------+ | | +-------|Data block 2 |<------+
| Free |<----+ | Prev +-------------+
+------+ |
| | ...
|Next | +--->
| | |
V | | +-------------+
| +-------|Data block n |
... | Prev +-------------+
| ^
| |
+--------------+
The table header contains the block number of the first data
block, the last data block, and the first free block.
Blocks are numbered. The first block after the table header is
block 1.
All blocks (except the table header) are the same size. This
means the block number can be used to compute the offset of the
block within the file:
Revision 1.0 9 May 11, 1996
PARADOX 4.x FILE FORMATS
block offset = block length * ( block number - 1 ) + table header
length
Data blocks are organized as a bi-directional linked list, i.e.,
the block header in each block contains the block number of the
next and previous blocks in the linked list. The blocks are
linked in ascending key sequence based on the first record in
each block.
Within a block, records are stored in ascending sequence. The
block header contains the offset of the last record in the block.
When a record is deleted, any records that follow it "move up" to
overwrite it and the record length is subtracted from the last
record offset in the block header.
Free blocks are organized as a linked list. A free block contains
the block number of the next free block but does not contain the
block number of the previous free block.
A block is added to the free block list when all of the records
in the block are deleted. When records are inserted into the
table and a new block must be allocated, the first free block is
plucked from the list. The next free block becomes the new first
block.
If a new block is needed and there are no free blocks, then a new
block is added to the end of the file.
Note: Block 1 is never allowed to be a free block. If block 1 is
emptied, then the data from the next block in the linked
list is copied to block 1. The block whose data was copied
to block 1 is then added to the free list.
When you add a record, Paradox tries to put it in the data block
that contains the record that is just before it in key sequence.
If there is no room in the block, then:
* If you are inserting a record in the last data block, a new
block is allocated and the existing block does not split.
* Otherwise, the block is split. Half of the records remain in
the original block. A new block is created for the remaining
records.
Of course, linked list pointers are adjusted whenever a new block
is inserted. The indexes will also be updated. The primary index
contains one record for each data block. The key value stored in
Revision 1.0 10 May 11, 1996
PARADOX 4.x FILE FORMATS
the index is the key of the first record in the block. (More
about this later.)
Table Header
------------
The table header is the first block in the DB file. Some of the
fields in the header are described below. (The list is far from
complete.)
Field type codes are: UB = Unsigned byte. US = Unsigned Short
integer. UL = Unsigned Long integer.
Hex Field
Offset Type Description
------ ----- --------------------------------------------------
000000 US Record length
000002 US Length of the header block.
Usually 2k (even if the data block is not 2k).
The size may increase if there are a lot of fields
with long field names.
Worst case: 10k for 255 fields with 25-character
names.
000004 UB File type
00 - DB file for an keyed table
02 - DB file for an unkeyed table
000005 UB Data block size code
01 - Block size is 1k
02 - Block size is 2k
03 - Block size is 3k (not used in 4.5)
04 - Block size is 4k
000006 UL Number of records in DB
00000A US Number of blocks in use
00000C US Total blocks in file
00000E US First data block (always 1)
000010 US Last block in use
000021 UB Number of fields
000023 UB Number of key fields
00004D US Block number of first free block
000078 Start of field description array
The field description array contains two bytes for each field.
The first byte contains the field type code.
Revision 1.0 11 May 11, 1996
PARADOX 4.x FILE FORMATS
Code Field Type
---- ------------
01 A
02 D
03 S
05 $
06 N
0C M
0D B
The second byte contains the field length.
The length for $ and N is always 8.
The length for D is always 4.
The length for S is always 2.
The length for A ranges from 1 to 255 (01h to FFh).
The length for M ranges from 11 to 250 (0Bh to FAh).
The length for B ranges from 10 to 250 (0Ah to FAh).
The length for a B or M (blob) field includes 10 bytes used to
hold the blob's length and its location in the MB file.
Field names start at offset 120 (78h) plus 83 plus 6 times the
number of fields. Field names are in field number sequence. Each
field name is a null-terminated string (00h marks the end of the
string).
No data records are stored in the table header block.
DB Data Blocks
--------------
The following table describes the format of a DB data block.
Hex Field
Offset Type Description
------ ----- --------------------------------------------------
000000 US Next block number (Zero if last block)
000002 US Previous block number (Zero if first block)
000004 US Offset of last record in block.
000006 First data record.
Records are stored contiguously. There are no gaps between
records. Records contain no slack bytes between fields.
The last record offset is relative to the end of the header. Add
6 to calculate the offset from the start of the block.
Revision 1.0 12 May 11, 1996
PARADOX 4.x FILE FORMATS
If the block is empty, the offset is set to 0 minus record
length.
A zero in offset means that the block contains one record.
Since Paradox knows the record length and the block length, it
can use the last record offset to compute the number of records
in the block and the amount of free space in the block.
If you use a hex editor to look at the data, remember that the
fields in the block header are in little endian format but any
numeric data fields are in modified big endian format.
If a record is deleted, records after it move up in the block and
the record length is subtracted from the last record offset in
the block header.
Records are stored in key sequence. If a record is inserted in
the block, then records with higher keys "move down" to make room
for the new record. Record length is added to the last record
offset in the block header.
Revision 1.0 13 May 11, 1996
PARADOX 4.x FILE FORMATS
The PX File
-----------
The PX file contains the primary index records for a keyed table.
The first block in the PX file is the index header. The index
header is followed by the index data blocks.
The format of the PX file is very similar to the format of the DB
file. Index blocks are chained in a bi-directional linked list
and free blocks are chained in a linked list.
In addition to the list structure, the index blocks are organized
into a hierarchical (tree) structure. The tree structure is the
primary structure used when accessing the index.
A tree is a typical index structure. Almost any book about data
bases will describe the creation and maintenance of a tree-
structured index, so I won't go into too many details here.
However, I will show a simple example.
In this example we will assume that we have a set of 10,000
records sorted in key sequence. The key is an integer between 1
and 10,000.
We will also assume that a data block holds 10 data records and
an index block holds 10 index records. (This is unrealistic - the
index block would actually hold many more index records. However,
assuming 10 indexes per block simplifies the example.)
There is one index record per data block and the index record
contains the key of the first record in the data block.
When we insert the first data record in the DB file, an index
record is created in the PX file. The index record contains the
block number of the first data block (block 1 in DB) and the key
of the first record in that data block (key = 1).
When records 2 through 10 are inserted, they are placed in the
first data block in DB. No additional index records are
generated.
When record 11 is inserted, a new data block is created. We must
insert an index record containing the key of the first record in
the block (key = 11) and the block number of the second data
block. The index now contains two records.
If we continue inserting records, then we will eventually insert
the record whose key is 101.
Revision 1.0 14 May 11, 1996
PARADOX 4.x FILE FORMATS
When data record 101 is inserted, there is no room for the index
record in the first index block. A new index block is created to
hold the index record for 101. We now have two index blocks.
An index block is created to index the two index blocks. We will
refer to this as the level 2 index and refer to the first two
index blocks as the level 1 index. The level 2 index contains the
first key from each level 1 index block. We now have a structure
that looks like this:
Level 2 Index Level 1 Index
Key Keys in Block
---- ----
1 -----------> 1, 11,21, 31, ..., 91
101 -----------> 101
The level 2 record points to a block in the level 1 index. The
"pointer" is the block number of the level 1 index block.
When record 1001 is inserted, a second level 2 index block will
be created and a level 3 index block will be created to index the
level 2 blocks.
After 10,000 records have been inserted, we will have a structure
like the figure on the next page.
Revision 1.0 15 May 11, 1996
PARADOX 4.x FILE FORMATS
Index Structure for 10,000 Records
----------------------------------
+-----+ Level 3 (Index root)
| 1|--+
| 1001| |
| 2001| |
| 3001| |
| 4001| |
| 5001| |
| 6001| |
| 7001| |
| 8001| |
| 9001|--|--------------------+
+-----+ | |
V V
+-----+ ... +-----+ Level 2
| 1|--+ | 9001|
| 101|--|----------+ | 9101|
| 201| | | | 9201|
| 301| | | | 9301|
| 401| | | | 9401|
| 501| | | | 9501|
| 601| | | | 9601|
| 701| | | | 9701|
| 801| | | | 9801|
| 901| | | | 9901|------+
+-----+ | | +-----+ |
V V V
+-----+ +-----+ ... +-----+ Level 1
| 1|--+ | 101| | 9901|
| 11| | | 111| | 9911|
| 21| | | 121| | 9921|
| 31| | | 131| | 9931|
| 41| | | 141| | 9941|
| 51| | | 151| | 9951|
| 61| | | 161| | 9961|
| 71| | | 171| | 9971|
| 81| | | 181| | 9981|
| 91| | | 191| | 9991|----+
+-----+ | +-----+ +-----+ |
| |
| |
V V
Data block that Data block that
contains records contains records
1 through 10 9,991 through 10,000
Revision 1.0 16 May 11, 1996
PARADOX 4.x FILE FORMATS
To find a record via the index, we start at the root (level 3)
index and, at each index level, pick the entry that has the
highest key that is less than or equal to the key we are trying
to locate. Keep in mind that the block number we get from index
records above level 1 will refer to a block in the PX file. The
block number we get from the level 1 index refers to a block in
the DB file.
For example, to find the data record with key 185 we proceed as
follows:
From the root index we pick 1. This points to the level 2 *
index that contains 1, 101, 201, ...
From the level 2 index we pick 101. This points to the *
level 1 index that contains 101, 111, 121, ...
* From the level 1 index we pick 181. This points to the DB
block that contains data records 181 through 190.
Find the record with key 185 in the DB block. If the record *
has not been deleted, we will find it.
Since the entries in all blocks (PX and DB) are stored in key
sequence, a binary search can be used to locate the desired index
or data record in a block.
Statistics
----------
Paradox keeps statistics that are (probably) used to optimize
queries. The statistics are stored in the index and are updated
as records are inserted and deleted.
A record in the level 1 index contains the number of records in
the DB block that it points to.
A record in the level 2 index contains the sum of the statistics
(DB record counts) from the level 1 index block that it points
to.
A record in the level n index contains the sum of the statistics
from the level n-1 index block that it points to.
Revision 1.0 17 May 11, 1996
PARADOX 4.x FILE FORMATS
For example, in the sample structure each level 1 index record
contains 10. Each level 2 record contains 100. Each level 3
record contains 1000.
If the record with key = 128 is deleted, then the level 1 record
with key 121 contains 9. The level 2 record with key 101 contains
99. The level 3 index with key 1 contains 999.
Index Header
------------
The index header is the first block in the PX file. Some of the
fields in the header are described below. (The list is far from
complete.)
Hex Field
Offset Type Description
------ ----- --------------------------------------------------
000000 US Index record length
000002 US Length of index header size (2k)
000004 UB File type
01 - PX file
000005 UB Index block size code
01 - Block size is 1k
02 - Block size is 2k
03 - Block size is 3k (not used in 4.5)
04 - Block size is 4k
000006 UL Number of records in PX
00000A US Number of blocks in use
00000C US Total blocks in file
00000E US First index data block (always 1)
000010 US Last block in use
00001E US Block number of index root
000020 UB Number of index levels
000021 UB Number of fields in index
The index record length is six greater than the sum of the
lengths of the key fields.
The number of fields in the index is the same as the number of
key fields for the table.
Most of the block is filled with nulls.
No index records are stored in this block.
Revision 1.0 18 May 11, 1996
PARADOX 4.x FILE FORMATS
Index Blocks
------------
Within each PX block the records (primary key values) are stored
in ascending sequence. Records following a deleted record "move
up".
The block header is just like the one used in the DB file.
Hex Field
Offset Type Description
------ ----- --------------------------------------------------
000000 US Next block number (Zero if last block)
000002 US Previous block number (Zero if first block)
000004 US Offset of last record in block.
000006 First index record.
An index record has the format:
<primary key fields> followed by six bytes used as follows:
Bytes Contents
----- -------------------------------------------------
1-2 Unsigned short integer that contains the block number
associated with the key field.
For a level 1 block, this is a DB file block number.
For a block above level 1, this is a PX block number.
3-4 Unsigned short integer containing statistics.
5-6 Unsigned short integer.
Purpose unknown. Usually contains zero.
If you use a hex editor to look at the data, remember that the
fields in the block header are in little endian format.
An index record is treated as data. Numeric fields, including the
data block number and statistics, are stored in modified big
endian format.
Revision 1.0 19 May 11, 1996
PARADOX 4.x FILE FORMATS
The MB File
-----------
The MB file is used to store BLOB (Binary Large Object) data.
The format of the MB file is very different from the format of
the DB and PX files. The blocks are not chained and the block
length varies. The header is always 4k (1000h) long. Blocks that
follow the header have a length that is a multiple of 4k.
Each block has the following information in the first three
bytes.
UB - Record type
00 - Header block
02 - Single blob block
03 - Suballocated block
04 - Free block
US - Number of 4k chunks in this block.
The maximum size is FFFFh x 1000h = 65,535 x 4096.
This is 256 megabytes (the maximum length of a blob).
Two methods are used to allocate space for blobs.
A separate block (record type 02) is allocated for a blob *
over 2k bytes long. The length of the data block is the
smallest multiple of 4k that is larger than the blob.
One 4k block may be suballocated (record type 03) to hold up *
to 64 small (under 2k) blobs.
The Blob Header
---------------
The blob header is the first block in the MB file. It is 4096
(4k) bytes long. The block contains the following fields.
Hex Field
Offset Type Description
------ ----- --------------------------------------------------
000000 UB Record type = 00h (Header block)
000001 US Size of block divided by 4k
1 because the header is 4k
000003 US Modification count
This is reset to 1 by a table restructure.
Every time a blob is updated, this field is
Revision 1.0 20 May 11, 1996
PARADOX 4.x FILE FORMATS
incremented. I don't know why this is done.
The mod number is stored with the blob data.
Again, I don't know why.
*** ALL OF THE FOLLOWING ARE GUESSES ***
00000B US Base size of data blocks (1000h).
00000D US Size of suballocated data blocks (1000h).
000010 UB Suballocation chunk size (10h)
000011 US Number of suballocations per block (00040h)
000013 US Suballocation threshold (0800h)
The border line between "big" and "small" blobs.
Big blobs get their own blocks.
Several small blobs may be stored in one block.
Note: I can't even guess at the rest of the header block's
fields. I assume that there's some kind of garbage
collection scheme that saves data in the header block, but I
haven't been able to figure it out.
Single Blob Block (Type 02)
---------------------------
A single blob block can appear anywhere in the MB file.
A "long" blob is stored in this kind of block.
The block length is the smallest multiple of 4k that is greater
than or equal to the length of the blob.
The block header contains the following fields.
Hex Field
Offset Type Description
------ ----- --------------------------------------------------
000000 UB Record type = 02h (block contains one blob)
000001 US Size of block divided by 4k
000003 UL Length of the blob.
000007 US Modification number
This is reset to 1 by a table restructure.
000009 Blob data starts here
Revision 1.0 21 May 11, 1996
PARADOX 4.x FILE FORMATS
Suballocated Block (Type 03)
----------------------------
A suballocated block can appear anywhere in the MB file.
Up to 64 short blobs may be stored in this type of block.
A suballocated block is 4k bytes long. It has a 12-byte header
followed by an array of up to 64 5-byte blob pointers.
The DB field that "owns" a blob contains the offset of the block
(from the start of the MB file) and the index of one of the
entries in the blob pointer array. The array entry points to the
blob data.
This method of using indirect pointers (a pointer to a pointer)
is quite common. It simplifies garbage collection. Paradox can
move data around within the block to consolidate non-contiguous
chunks of free space. All it has to do is update the pointer
array within the block.
The 12-byte block header contains the following fields.
Hex Field
Offset Type Description
------ ----- --------------------------------------------------
000000 UB Record type = 03h (Suballocated block)
000001 US Size of block divided by 4k
1 because the block size is 4k
Note: There are nine more bytes in the header. I have no idea
what they contain.
The blob pointer array follows the header. The array has 64
entries numbered from 00h to 3Fh. Entries are used in reverse
order. The 3Fh entry is used first. Then the 3Eh entry, then 3D,
and so on ..
The offset (from the start of the block) of the entry indexed by
i is calculated as: offset = 12 + ( 5 * i )
Each entry is 5 bytes long and has the following format.
Hex Field
Offset Type Description
------ ----- --------------------------------------------------
000000 UB Data offset divided by 16
Revision 1.0 22 May 11, 1996
PARADOX 4.x FILE FORMATS
The offset is measured from start of the 4k block.
If this is zero, then the blob was deleted and
the space has been reused for another blob (which
is associated with another entry in the array).
000001 UB Data length divided by 16 (rounded up)
000002 US Modification number from blob header
This is reset to 1 by a table restructure.
000004 UB Data length modulo 16.
If this is zero, then the associated blob has been
deleted and the space can be reused
For an active blob, this value will be between
01h and 10h
Note: Suballocations are made in 16-byte chunks. The first
available chunk is at offset 0150h in the block. Multiply
the first byte of the pointer array entry by 16 to get the
offset. The next byte is the number of chunks. The last byte
tells you how many bytes of data there are in the last
chunk. I don't know the purpose of the modification number.
For example, if an array entry looks like: 25030F0007 then the
data associated with the entry starts at offset 0250h (25h times
10h) and has 10h times 03h bytes allocated (48 bytes). The actual
data length is 27h (39 bytes) because there are only 7 bytes of
data in the last 16 byte chunk. The modification number is in
little endian format and is 000Fh (15).
Free Block (Type 04)
--------------------
A free block can appear anywhere in the MB file.
The block length is a multiple of 4k.
If there are several contiguous free blocks, then the combined
length is placed in the first one.
The block header contains the following fields.
Hex Field
Offset Type Description
------ ----- --------------------------------------------------
000000 UB Record type = 04h (free block)
000001 US Size of block divided by 4k
Revision 1.0 23 May 11, 1996
PARADOX 4.x FILE FORMATS
If the blob in a type 03h block (single blob block) is deleted,
then its block becomes a free block. If all of the blobs in a
type 02h (suballocated block) are deleted, then the block becomes
a free block.
Deletions happen more often than you might imagine. Whenever you
modify a blob, the original blob is deleted and the modified
version is saved as a new blob.
Paradox never updates a blob "in-place".
Revision 1.0 24 May 11, 1996
PARADOX 4.x FILE FORMATS
X** File
--------
An X** file contains the data records for a secondary index.
There is one record for each record in the DB file.
An X** file has the same logical format as a DB file.
The X** data record contains the secondary index fields followed
by the primary index fields. An additional type S field named
"Hint" is the last field in the record. All fields except "Hint"
are included in the record key.
For example, if your data record has the key "Custid" and you
define a compound secondary index on "Last Name" and "First
Name", then the X** record contains four fields: [Last Name],
[First Name], [Custid], and [Hint]. The first three fields are in
the primary index for the X** file.
[Hint] contains the block number of the DB file block that
contains the record associated with the index record. This means
that the DB record can retrieved directly. It doesn't have to be
located via the primary index in the PX file.
Although [Hint] is defined as a type S field, it is treated as an
unsigned integer by Paradox. Paradox knows it's a block number.
Note: If you specify more than 16 secondary index fields, then
only the first 16 fields are included in the index. Primary
index fields may be included in the index but the first
primary index field may not be the first secondary index
field.
X** File Header
---------------
The X** file header is the first block in the X** file. It has
the same format as the Table Header (DB File Header).
Hex Field
Offset Type Description
------ ----- --------------------------------------------------
000000 US Record length
000002 US Length of the header block.
Usually 2k (even if the data block is not 2k).
000004 UB File type
08 - Secondary index data file
Revision 1.0 25 May 11, 1996
PARADOX 4.x FILE FORMATS
000005 UB Data block size code
01 - Block size is 1k
02 - Block size is 2k
03 - Block size is 3k (not used in 4.5)
04 - Block size is 4k
000006 UL Number of records in X**
00000A US Number of blocks in use
00000C US Total blocks in file
00000E US First data block (always 1)
000010 US Last block in use
000021 UB Number of fields
000023 UB Number of key fields
00004D US Block number of first free block
000078 Start of field description array
The field description array contains two bytes for each field.
(See the Table Header description.)
Field names start at offset 120 (78h) plus 83 plus 6 times the
number of fields. Field names are in field number sequence. Each
field name is a null-terminated string (00h marks the end of the
string).
For a compound index (more than one secondary index field), the
fields have the same names that they have in the DB file.
For a simple index (only one secondary index field), the name of
the index field is replaced by the name "Sec Key". This is really
stupid!
The primary key field names follow the secondary index field
names. "Hint" follows the primary key field names.
Immediately after the terminating null for [Hint], there is a
series of n unsigned short integers, where n is the number of
fields in the record. The first m of these integers are the field
numbers (in DB) of the secondary index fields, where m is the
number of fields in the secondary index.
Immediately after the integers, there is a null-terminated string
that contains the name of the sort order, e.g., "ascii".
The name of the index follows the sort order string. The index
name (a.k.a., label) is a null-terminated string.
No secondary index data records are stored in the X** header
block.
Revision 1.0 26 May 11, 1996
PARADOX 4.x FILE FORMATS
X** Data Blocks
---------------
The data blocks have the same format as a DB data block.
Revision 1.0 27 May 11, 1996
PARADOX 4.x FILE FORMATS
Y** File
--------
A Y** file is the primary index for an X** file.
Its logical format is identical to the format of the PX file.
Y** Header
----------
The Y** header is the first block in the Y** file. It has the
same format as the PX file header. Some of the fields in the
header are described below. (The list is far from complete.)
Hex Field
Offset Type Description
------ ----- --------------------------------------------------
000000 US Index record length
000002 US Length of index header size (2k)
000004 UB File type
05 - Y** file
000005 UB Index block size code
01 - Block size is 1k
02 - Block size is 2k
03 - Block size is 3k (not used in 4.5)
04 - Block size is 4k
000006 UL Number of records in Y**
00000A US Number of blocks in use
00000C US Total blocks in file
00000E US First index data block (always 1)
000010 US Last block in use
00001E US Block number of index root
000020 UB Number of index levels
000021 UB Number of fields in index
The index record length is six greater than the sum of the
lengths of the key fields.
No index records are stored in this block.
Y** Index Blocks
----------------
Same format as the index blocks in the PX file.
Revision 1.0 28 May 11, 1996