home *** CD-ROM | disk | FTP | other *** search
-
- Definition of the data types used in the file format list
-
- In the file format list, several short mnemonics are used to describe
- the structure of the data stored. Here I describe the structure (and
- possible conversion) between some of these types. As some types have
- different sizes across the platforms, for most types the byte order and
- bit size is given to describe it.
-
- ASCIIZ A sequence of characters(->char), terminated
- with the special character with the value 0.
- Note that ASCIIZ strings as most structures on
- Intel machines should not be larger than
- 64Kb due to the ancient segmentation used.
- BCD Binary coded decimal
- A decimal number is converted into a hexadecimal number
- which has the same digits as the decimal number.
- (10d becomes 10h, 21d becomes 21h)
- Bitmap If a value is declared as bitmapped, that means that
- every bit in this value might have a different meaning.
- The bytes are numbered from right to left, the least
- significant bit has the number 0. After the bit number,
- there are either two statements, separated by a
- slash("/"), which are the two meanings if the bit is
- set / not set, or one single statement, which is the
- meaning of this bit, if it is set.
- Byte 8 bit unsigned number. Smallest unit a record
- consists of. All offsets are in the unit bytes.
- (0-255)
- Char Synonym for byte, most values are between 32 and
- 255. (#0-#255)
- DWord 32 bit signed number. Well, maybe some of the
- formats use a DWord which is a 32 bit unsigned
- number, but as files tend not to be greater than
- 2GB, this won't be my concern. To convert
- between Intel and Motorola format, you have to
- swap bytes #2 & #3 and bytes #1 & #4.(-2Gb-+2Gb)
- Int Integer. Signed 16-bit number.
- (-32767-+32767)
- LString A string which is preceeded by the length. Also
- named "counted" string. Used by most Pascal
- implementstions Maximum length is 255 bytes, but it can
- contain any char.
- Nybble The upper or lower four bits of a byte. A nybble
- is a single hex digit and can have values from
- 0 to 15. A signed nybble can have values from
- -8 to 7 with bit 3 being the sign bit.
- Paragraph A multiple of 16. A paragraph was the resolution of the
- Intel chip 64K segments.
- Word 16 bit unsigned number. Note that byte order is
- important, wether you have a Motorola machine or
- an Intel one. Conversion between the two formats
- is simply by swapping byte #1 with byte #2.
- (0-65535)
-
- How to identify different files
-
- While searching for different file formats, I found the following programs
- helpfull to gather information about different files. They all are DOS programs
- since I'm not familiar with other platforms (except Windows). Most of them
- should be available on SimTel CDs or via FTP at ftp.cdrom.com, except for my
- program TF, which is still in beta.
-
- LIST.COM v9.0a by Vernon Buerg
- List is a file lister which supports both text and hex-view.
-
- HIEW.EXE v4.18 by Sen
- Another file lister with build-in disassembler.
-
- FILE.EXE v2.0 by Felix von Leitner
- File is a file identification program.
-
- Q.COM v3.01 by SemWare
- QEdit is the editor I'm editing the list with.
-
- TF.EXE v0.38 by me
- The program that started it all. A "simple" file identification
- program - no more, since it has grown too big by now.
- Still unreleased, since it is not really extensible yet.
-
- The file formats list meta list ;)
-
- The file format list uses a certain format to make it readable by programs which
- convert it into the WinHelp format or create program structures out of the
- lists. This format is very similar to the format used by Ralf Brown in his PC
- interrupt list but was extended by me to accomodate for the specific needs of
- this list :
-
- Each topic in the list is delimited by a line of 45 chars, in which the
- first 8 contain the char '-'. After these, there follows one character which
- contains the type of topic. The different topics are described in the list
- itself, the char '!' denotes an information topic - like the list of chars and
- their meaning. After the topic identifier, there follows another '-' char and
- then the topic name, not containing any '-' chars. After the topic name, there
- may be some other descriptors like for Motorola byte ordering, guesswork marking
- or other purposes, see the main list for further information. The line is ended
- with at least one '-' char. Take the following prototype :
-
- --------?-TEST------------------------------
-
- OFFSET Count TYPE Description
- EXTENSION:
- OCCURENCES:
- PROGRAMS:
- REFERENCE:
- SEE ALSO:
- VALIDATION:
-
- Sub-topics like different records are mostly delimited by three dashes ('-').
- I suggest folding them up and making them available as a popup window.
-
- Tables have the following format :
- (see table 0000)
- for a table reference and
- (Table 0000)
- for the beginning of a table. The end of a table is undefined (yet).
-
-
- A primer on file formats
-
- Abbrevations
- Throughout the list, many abbrevations are used, some in the reference
- section. Here some are explained :
-
- c't
- The c't is a german computer magazine, which developed the Borland
- Pascal for OS/2 patch. They release source code in files called
- CTmmyy.*. Note that comments in the source code and the language in
- the issues tend to be german :-)
-
- DDJxxyy
- (Doctor Dobb's Journal)
- The DDJ is a monthly publication by M&T/US which is intended for the
- professional programmer. The four digits after the name indicate the
- month/year of the issue referred to. Most of the sourcecode published
- in the issue is available electronically on Compu$erve and other BBSes.
- The files have the name DDJyymm.
-
- PDN
- Programmer's Distribution Net
- A network dedicated to the distribution of source code useful to
- programmers. Often linked with Fido-nodes.
-
- Contributions to this list were made by :
- Ralf Brown (The .EXE file formats from the INTERRUPT List, general layout)
- Daniel Dissett (ddissett@netcom.com)
- Marcus Groeber (marcusg@ph-cip.uni-koeln.de)
- Darrel Hankerson (hankedr@mail.auburn.edu)
- Carl Hauser (chauser.parc@xerox.com)
- Jouni Miettunen (jon@stekt.oulu.fi)
- Jan Nicolai Langfeldt (janl@ifi.uio.no)
- Mark Ouellet (Telix .FON structures)
- Greg Roelofs (roe2@midway.uchicago.edu)
- Jesus Villena (CONVERT.EXE, a digital sample conversion program)
- Christos Zoulas (christos@deshaw.com)
-
- Information gleaned from other programs :
- Formats for Word and WordPerfect (Selke's filetype)
-