home *** CD-ROM | disk | FTP | other *** search
- RPM File Format
- ===============
-
- This document describes the RPM file format version 3.0, which is used
- by RPM versions 2.1 and greater. The format is subject to change, and
- you should not assume that this document is kept up to date with the
- latest RPM code. That said, the 3.0 format should not change for
- quite a while, and when it does, it will not be 3.0 anymore :-). In
- any case, THE PROPER WAY TO ACCESS THESE STRUCTURES IS THROUGH THE RPM
- LIBRARY!!
-
- The RPM file format covers both source and binary packages. An RPM
- package file is divided in 4 logical sections:
-
- . Lead -- 96 bytes of "magic" and other info
- . Signature -- collection of "digital signatures"
- . Header -- holding area for all the package information
- . Archive -- compressed archive of all the files in the package
-
- All 2 and 4 byte "integer" quantities (int16 and int32) are stored in
- network byte order. When data is presented, the first number is the
- byte number, or address, in hex, followed by the byte values in hex,
- followed by character "translations" (where appropriate).
-
- Lead
- ----
-
- The Lead is basically for file(1). All the information contained in
- the Lead is duplicated or superceded by information in the Header.
- Much of the info in the Lead was used in old versions of RPM but is
- now ignored. The Lead is stored as a C structure:
-
- struct rpmlead {
- unsigned char magic[4];
- unsigned char major, minor;
- short type;
- short archnum;
- char name[66];
- short osnum;
- short signature_type;
- char reserved[16];
- };
-
- and is illustrated with one pulled from the rpm-2.1.2-1.i386.rpm
- package:
-
- 00000000: ed ab ee db 03 00 00 00
-
- The first 4 bytes (0-3) are "magic" used to uniquely identify an RPM
- package. It is used by RPM and file(1). The next two bytes (4, 5)
- are int8 quantities denoting the "major" and "minor" RPM file format
- version. This package is in 3.0 format. The following 2 bytes (6-7)
- form an int16 which indicates the package type. As of this writing
- there are only two types: 0 == binary, 1 == source.
-
- 00000008: 00 01 72 70 6d 2d 32 2e ..rpm-2.
-
- The next two bytes (8-9) form an int16 that indicates the architecture
- the package was built for. While this is used by file(1), the true
- architecture is stored as a string in the Header. See, lib/misc.c for
- a list of architecture->int16 translations. In this case, 1 == i386.
- Starting with byte 10 and extending to byte 75, are 65 characters and
- a null byte which contain the familiar "name-version-release" of the
- package, padded with null (0) bytes.
-
- 00000010: 31 2e 32 2d 31 00 00 00 1.2-1...
- 00000018: 00 00 00 00 00 00 00 00 ........
- 00000020: 00 00 00 00 00 00 00 00 ........
- 00000028: 00 00 00 00 00 00 00 00 ........
- 00000030: 00 00 00 00 00 00 00 00 ........
- 00000038: 00 00 00 00 00 00 00 00 ........
- 00000040: 00 00 00 00 00 00 00 00 ........
- 00000048: 00 00 00 00 00 01 00 05 ........
-
- Bytes 76-77 ("00 01" above) form an int16 that indicates the OS the
- package was built for. In this case, 1 == Linux. The next 2 bytes
- (78-79) form an int16 that indicates the signature type. This tells
- RPM what to expect in the Signature. For version 3.0 packages, this
- is 5, which indicates the new "Header-style" signatures.
-
- 00000050: 04 00 00 00 68 e6 ff bf ........
- 00000058: ab ad 00 08 3c eb ff bf ........
-
- The remaining 16 bytes (80-95) are currently unused and are reserved
- for future expansion.
-
- Signature
- ---------
-
- A 3.0 format signature (denoted by signature type 5 in the Lead), uses
- the same structure as the Header. For historical reasons, this
- structure is called a "header structure", which can be confusing since
- it is used for both the Header and the Signature. The details of the
- header structure are given below, and you'll want to read them so the
- rest of this makes sense. The tags for the Signature are defined in
- lib/signature.h.
-
- The Signature can contain multiple signatures, of different types.
- There are currently only three types, each with its own tag in the
- header structure:
-
- Name Tag Header Type
- ---- ---- -----------
- SIZE 1000 INT_32
- MD5 1001 BIN
- PGP 1002 BIN
-
- The MD5 signature is 16 bytes, and the PGP signature varies with
- the size of the PGP key used to sign the package.
-
- As of RPM 2.1, all packages carry at least SIZE and MD5 signatures,
- and the Signature section is padded to a multiple of 8 bytes.
-
- Header
- ------
-
- The Header contains all the information about a package: name,
- version, file list, etc. It uses the same "header structure" as the
- Signature, which is described in detail below. A complete list of the
- tags for the Header would take too much space to list here, and the
- list grows fairly frequently. For the complete list see lib/rpmlib.h
- in the RPM sources.
-
- Archive
- -------
-
- The Archive is currently a gzipped cpio archive. The cpio
- archive type used is SVR4 with a CRC checksum.
-
- The Header Structure
- --------------------
-
- The header structure is a little complicated, but actually performs a
- very simple function. It acts almost like a small database in that it
- allows you to store and retrieve arbitrary data with a key called a
- "tag". When a header structure is written to disk, the data is
- written in network byte order, and when it is read from disk, is is
- converted to host byte order.
-
- Along with the tag and the data, a data "type" is stored, which indicates,
- obviously, the type of the data associated with the tag. There are
- currently 9 types:
-
- Type Number
- ---- ------
- NULL 0
- CHAR 1
- INT8 2
- INT16 3
- INT32 4
- INT64 5
- STRING 6
- BIN 7
- STRING_ARRAY 8
-
- One final piece of information is a "count" which is stored with each
- tag, and indicates the number of items of the associated type that are
- stored. As a special case, the STRING type is not allowed to have a
- count greater than 1. To store more than one string you must use a
- STRING_ARRAY.
-
- Altogether, the tag, type, count, and data are called an "Entry" or
- "Header Entry".
-
- 00000000: 8e ad e8 01 00 00 00 00 ........
-
- A header begins with 3 bytes of magic "8e ad e8" and a single byte to
- indicate the header version. The next four bytes (4-7) are reserved.
-
- 00000008: 00 00 00 20 00 00 07 77 ........
-
- The next four bytes (8-11) form an int32 that is a count of the number
- of entries stored (in this case, 32). Bytes 12-15 form an int32 that
- is a count of the number of bytes of data stored (that is, the number
- of bytes made up by the data portion of each entry). In this case it
- is 1911 bytes.
-
- 00000010: 00 00 03 e8 00 00 00 06 00 00 00 00 00 00 00 01 ................
-
- Following the first 16 bytes is the part of the header called the
- "index". The index is made of up "index entries", one for each entry
- in the header. Each index entry contains four int32 quantities. In
- order, they are: tag, type, offset, count. In the above example, we
- have tag=1000, type=6, offset=0, count=1. By looking up the the tag
- in lib/rpmlib.h we can see that this entry is for the package name.
- The type of the entry is a STRING. The offset is an offset from the
- start of the data part of the header to the data associated with this
- entry. The count indicates that there is only one string associated
- with the entry (which we really already knew since STRING types are
- not allowed to have a count greater than 1).
-
- In our example there would be 32 such 16-byte index entries, followed
- by the data section:
-
- 00000210: 72 70 6d 00 32 2e 31 2e 32 00 31 00 52 65 64 20 rpm.2.1.2.1.Red
- 00000220: 48 61 74 20 50 61 63 6b 61 67 65 20 4d 61 6e 61 Hat Package Mana
- 00000230: 67 65 72 00 31 e7 cb b4 73 63 68 72 6f 65 64 65 ger.1...schroede
- 00000240: 72 2e 72 65 64 68 61 74 2e 63 6f 6d 00 00 00 00 r.redhat.com....
- ...
- 00000970: 6c 69 62 63 2e 73 6f 2e 35 00 6c 69 62 64 62 2e libc.so.5.libdb.
- 00000980: 73 6f 2e 32 00 00 so.2..
-
- The data section begins at byte 528 (4 magic, 4 reserved, 4 index
- entry count, 4 data byte count, 16 * 32 index entries). At offset 0,
- bytes 528-531 are "rpm" plus a null byte, which is the data for the
- first index entry (the package name). Following is is the data for
- each of the other entries. Each string is null terminated, the strings
- in a STRING_ARRAY are also null terminated and are place one after
- another. The integer types are aligned to appropriate byte boundaries,
- so that the data of INT64 type starts on an 8 byte boundary, INT32
- type starts on a 4 byte boundary, and an INT16 type starts on a 2 byte
- boundary. For example:
-
- 00000060: 00 00 03 ef 00 00 00 06 00 00 00 28 00 00 00 01 ................
- 00000070: 00 00 03 f1 00 00 00 04 00 00 00 40 00 00 00 01 ................
- ...
- 00000240: 72 2e 72 65 64 68 61 74 2e 63 6f 6d 00 00 00 00 r.redhat.com....
- 00000250: 00 09 9b 31 52 65 64 20 48 61 74 20 4c 69 6e 75 ....Red Hat Linu
-
- Index entry number 6 is the BUILDHOST, of type STRING. Index entry
- number 7 is the SIZE, of type INT32. The corresponding data for entry
- 6 end at byte 588 with "....redhat.com\0". The next piece of data
- could start at byte 589, byte that is an improper boundary for an INT32.
- As a result, 3 null bytes are inserted and the date for the SIZE actually
- starts at byte 592: "00 09 9b 31", which is 629553).
-
- Tools
- -----
-
- The tools directory in the RPM sources contains a number of small
- programs that use the RPM library to pick apart packages. These
- tools are mostly used for debugging, but can also be used to help
- you understand the internals of the RPM package format.
-
- rpmlead - extracts the Lead from a package
- rpmsignature - extracts the Signature from a package
- rpmheader - extracts the Header from a package
- rpmarchive - extracts the Archive from a package
- dump - displays a header structure in readable format
-
- Given a package foo.rpm you might try:
-
- rpmlead foo.rpm | od -x
- rpmsignature foo.rpm | dump
- rpmheader foo.rpm | dump
- rpmarchive foo.rpm | zcat | cpio --list
-