next up previous contents
Next: List of compiler source Up: Free Pascal programmers guide Previous: Accessing DOS memory under

Anatomy of a unit file

  A unit file consists of basically five parts:

  1. A unit header.
  2. A file references part. This contains the references to used units and sources with name, checksum and time stamps.
  3. A definition part. Contains all type and procedure definitions.
  4. A Symbol part. Contains all symbol names and references to their definitions.
  5. A list of units that are in the implementation part.

The header consists of a sequence of 20 bytes, together they give some information about the unit file, the compiler version that was used to generate the unit file, etc. The complete layout can be found in table (A.1). The header is generated by the compiler, and changes only when the compiler changes. The current and up-to-date header definition can be found in the files.pas source file of the compiler. Look in this file for the unitheader constant declaration.

  

Byte What is stored
0..3 The letters 'PPU' in upper case. This acts as a check.
4..6 The unit format as a 3 letter sequence : e.g. '0','1,'2' for format 12.
7,8 The compiler version and release numbers as bytes.
9 The target OS number.
10 Unit flags.
11..14 Checksum (as a longint).
15,16 unused (equal to 255).
17..20 Marks start of unit file.
Table A.1: Unit header structure.

After the header, in the second part, first the list of all source files for the unit is written. Each name is written as a direct copy of the string in memory, i.e. a length bytes, and then all characters of the string. This list includes any file that was included in the unit source with the {$i file} directive. The list is terminated with a $ff byte marker. After this, the list of units in the uses clause is written, together with their checksums. The file is written as a string, the checksum as a longint (i.e. four bytes). Again this list is terminated with a $ff byte marker.

After that, in the third part, the definitions of all types, variables, constants, procedures and functions are written to the unit file.

They are written in the following manner: First a byte is written, which determines the kind of definition that follows. then follows, as a series of bytes, a type-dependent description of the definition. The exact byte order for each type can be found in table (A.2)

  

hline Type Start byte Size Stored fields
Pointer 3 4 Reference to the type pointer points to.
Base type 2 9
1 byte to indicate base type.
4-byte start range
4-byte end range
Table A.2: Description of definition fields
Array type 5 16
4-byte reference to element type.
4-byte reference to range type.
4-byte start range (longint)
4-byte end range (longint)
Procedure 6 ?
4-byte reference to the return type definition.
2 byte Word containing modifiers.
2 byte Word containing number of parameters.
5 bytes per parameter.
1 byte : used registers.
String containing the mangled name.
8 bytes.
Procedural type 21 ?
4-byte reference to the return type definition.
2 byte Word containing modifiers.
2 byte Word containing number of parameters.
5 bytes per parameter.
String 9 1 1 byte containing the length of the string.
Record 15 variable
Longint indicating record length
list of fields, to be read as unit in itself.
$ff end marker.
Class 18 variable
Longint indicating data length
String with mangled name of class.
4 byte reference to ancestor class.
list of fields, to be read as unit in itself.
$ff end marker.
file 16 1(+4)
1 byte for type of file.
4-byte reference to type of typed file.
Enumeration 19 4 Biggest element.
set 20 5
4-byte reference to set element type.
1 byte flag.

This list of definitions is again terminated with a $ff byte marker.

After that, a list of symbols is given, together with a reference to a definition. This represents the names of the declarations, and the definition they refer to.

A reference consists of 2 words : the first word indicates the unit number (as it appears in the uses clause), and the second word is the number of the definition in that unit. A nil reference is stored as $ffffffff.

After this follows again a $ff byte terminated list of filenames: The names of the units in the uses clause of the implementation section.


next up previous contents
Next: List of compiler source Up: Free Pascal programmers guide Previous: Accessing DOS memory under

Michael Van Canneyt
Tue Mar 31 16:50:06 CEST 1998