The C Users' Group Library 1994 August

home *** CD-ROM | disk | FTP | other *** search

/ The C Users' Group Library 1994 August / wc-cdrom-cusersgrouplibrary-1994-08.iso / vol_200 / 265_01 / format.doc < prev next >

Wrap

Text File | 1990-02-15 | 10KB | 463 lines

- 1 - 1. Introduction This document describes the physical and logical format of the data exchange standard of the C Users Group. This standard allowes it to transfer data files among different systems. 2. Exchange Media This standard does not define only some general data exchange medias. All possible medias may be used, as long as CUG has the necessary hardware and driver software to read and write the physical format. Because of the real-world systems, at least the following medias should be supported: - 8 inch, single sided, single density diskette (about 254KB?) - 5.25 inch, double sided, double density diskette (360KB) as found in the IBM PC family of computers - 3.5 inch, ????, ??? diskette (720KB) as found in the newer IBM PS/2 systems and many smaller machines (like Atari ST) - 9 track industrial standard tape From all medias only the 9-track tape needs explantation: It is for some systems easier to read tapes than diskettes. Other systems (Unisys Series 1100 e. g.) don't support diskette drives but tapes. To support such systems, the 9- track tape is the only alternative and must be included as exchange media. (I'm not sure if it's better to write tapes in the EBCDIC character set????). Other exchange medias may be standardized if requested. The most important fact is that there must be input/output drivers which are able to read the media as a single file, bypassing the operating system directory structure. 3. Archive Format The archive file is a single sequential file that contains a well-defined set of header and data blocks. A single header/data block set describes an entire file. Multiple sets may be included in a single archive file, as long as no - 2 - single set is devided across archive file boundaries. The archive file contains only true ascii characters (that is, characters within the range 0..127)1. The archive file format bases on the X/OPEN cpio standard (with -c option active). One goal of this exchange standard is that X/OPEN cpio disks can be read and written with CUG utilities. 3.1 File Layout As mentioned above, the file contains a continous set of header (HDRB) and data (DATB) blocks. A special end-of-file header block (EOFB) annonces the end of the archive file. Note that this block may be present before the physical end of file. There is no corresponding begin-of-file block. Figure 1.... general archive layout <HDRB><DATB><HDRB><DATB>...<HDRB><DATB><EOFB> 3.2 Header Block (HDRB) Format The header block closely corrosponds to the X/OPEN cpio file header standard. It can be described by the following scanf(): scanf("%6o%6o%6o%6o%6o%6o%6o%6o%11lo%6o%6o%s", >isn't this an error and the following correct: < >scanf("%6o%6o%6o%6o%6o%6o%6o%6o%11lo%6o%11lo%s",< &Hdr.h_magic, &Hdr.h_dev, &Hdr.h_ino, &Hdr.h_mode, &Hdr.h_uid, &Hdr.h_gid, &Hdr.h_nlink, &Hdr.h_rdev, &Longtime, &Hdr.h_namesize, &Longfile, Hdr.h_name); Because most of the fields in the header record only have meaning on Unix systems, they are only included for conformance to the X/OPEN standard and are not used by the exchange standard. Their exact meaning in the CUG exchange format is as follows2: __________ 1. If the target system ueses a different character set, conversion takes place when reading and writing the exchange media not when building or extracting the archive files. 2. Note that the term "not used" means that the field is - 3 - h_magic Identifies the record as header block. Must be the constant 070707 (octal). Every other value indicates an error. h_dev not used h_ino not used h_mode not used, set to 0100666 on output (this setting identifies it as a regular file which can be read and written by everyone to the Unix cpio) h_uid not used h_gid not used h_nlink not used, ignored on input, set to one on output h_rdev not used Longtime not used h_namesize This field contains the length of the null- terminated pathname h_name including the null-byte. Longfile This field contains the exact size of the following data block (DATB). Immedeatly after reading Longfile number bytes from the archive, the next archive record must be a HDRB (or EOFB). h_name File name of the following data block (DATB). When extracting the archive, the following data block is stored under this name. Note that there are restrictions for file- name construction: No system-dependant parts should be included. This is even true for pathnames, which are not common to all systems. So the following rules should be applied to file name construction. They base on the KERMIT file transfer protocol and will ____________________________________________________________ ignored on input and set to zero on output. - 4 - guarantee that the file can be created under virtually every operating system. Violation of this rules may cause useless of the archive file on some OS. - The only allowed characters in filenames are digits and upper-case alphabetics (ascii 0x30-0x39 and 0x41-0x5a). - A filename consits of at most 8 characters, an optional period and at most additional three characters. If you want to specify the last three characters, the period _m_u_s_t be included. This are the same rules as for MS-DOS files (without pathnames). - No system-dependant information (like drive specifiers or pathnames) may be included. 3.3 EOF Block (EOFB) Format The EOF block is a special case of the header block. It's an ordinary header block with Longfile set to zero and h_name set to the constant "TRAILER!!!". 3.4 Data Block (DATB) Format The data block contains the files data. No file is splitted across two or more data blocks. To achive portability between different systems, the following restrictions must be applied to the data block: - The data block contains only printable ascii characters. The one and only exception from this general rule is the newline character, which is used to partition source lines. - The newline sequence consits of a single newline character ('0, ascii 0x10). This representation is not affected by the target system newline sequence. Newline conversion is (if needed) done when reading and writing the archive files. - Tabulation characters are expanded when the archive file is written. They are not automatically recompressed when extracting the archive file. This is neccessary because of the different tab settings under different operating systems (some don't support tabs at all). - 5 - 4. Implementation The exchange utilities are divided in two parts: The first one read or write the exchange media, constructing an ordinary sequential file that can be processed by normal operating system calls. The second part reads or writes these sequential files and extracts the single source files. 5. Usage of the archive/unarchive Utilities - usage of unarchive utility: read file start-sector byte-length file is the name of the sequential output file start-sector is the first sector read form sector length is computed using the byte length - usage of archive utility: write file start-sector file is the name of the sequential input file start-sector is the first sector written to the byte length is determined by the length of the input file. no input file may be larger than a single disk CONTENTS 1. Introduction......................................... 1 2. Exchange Media....................................... 1 3. Archive Format....................................... 1 3.1 File Layout..................................... 2 3.2 Header Block (HDRB) Format...................... 2 3.3 EOF Block (EOFB) Format......................... 4 3.4 Data Block (DATB) Format........................ 4 4. Implementation....................................... 5 5. Usage of the archive/unarchive Utilities............. 5 - i - LIST OF FIGURES Figure 1. general archive layout........................ 2 - ii -