home *** CD-ROM | disk | FTP | other *** search
- Included below is a draft proposal for IEEE P1003.1 regarding
- the recently raised issue of Archive/Data Interchange Format.
- I will deliver a proposal resembling it to P1003.1 at their
- next meeting, which is three weeks from today, in Seattle.
-
- Note two things: this is a proposal for P1003.1, not P1003.2,
- or any other group; if you disagree with my conclusions, you
- can submit your own proposal-- the address is below.
-
- If you agree with my approach but think it needs adjusting,
- you can send me mail or submit articles. If you disagree, you
- can also do those things.
-
-
-
- tar vs. cpio IEEE P1003.1 N.___
- 1 June 1987
-
- John S. Quarterman
-
- Institutional Representative from USENIX
- usenix!jsq
-
-
-
- Secretary, IEEE Standards Board
- Attention: P1003 Working Group
- 345 East 47th St.
- New York, NY 10017
-
- In both the Trial Use Standard and Draft 10, POSIX sS10.1
- describes a data interchange format based on the tar
- program. That section has appeared in every draft of IEEE
- 1003.1 in some form and has always been based on tar format.
- The P1003.1 Working Group has recently received two related
- proposals regarding that section: one to add cpio format
- (including old-style, non-ASCII (non c option) format);
- <N.048 Lorraine C. Kevra> <V11N14> <V11N25 Eric S. Raymond>
- the other to replace the existing tar-based format with cpio
- format. <N.043 X/OPEN> <V11N13> Some clarifications were
- received to the former. <N.064 Dominic Dunlop> <V11N15> It
- was also proposed verbally in the latest Working Group
- meeting to drop sS10.1 altogether and let P1003.2 handle the
- issue. <V11N08> <V11N11> <V11N09 Guy Harris> <V11N12 Doug
- Gwyn>
-
- The present note is a response to those proposals. Much of
- the detail in it is derived from articles posted in the
- USENET newsgroup comp.std.unix. Those articles are
- referenced with this format: <V11N09 Guy Harris> which gives
- the volume (11) and number of the article, and the name of
- the submittor. If no submittor name is given, the posting
- was by the moderator, John S. Quarterman. Thanks to those
- who submitted articles. However, the content of this note
- is solely the responsibility of the author.
-
- There are a number of problems with both cpio formats.
- First, those related to the non-ASCII format:
-
- 1. Numerous parameters, including inode numbers, mode
- bits, and user and group IDs, are kept in two-byte
- binary integers. This has historically produced
- serious byte-order problems when data is moved among
- systems with different byte orders. <V11N09 Guy
- Harris>
-
- 2. The byte-swapping and word-swapping options to the
- cpio program are inadequate patches; with an ASCII
- format the problem would not be present. The options
- are not consistent across versions of the program: in
-
-
-
-
-
-
-
- Page 2 tar vs. cpio IEEE P1003.1 N.___
-
-
-
- System III, data blocks and file names are byte
- swapped; in System V, only data blocks are byte
- swapped. <V11N09 Guy Harris>
-
- 3. The two-byte integer format limits the range of inode
- numbers to 1..65535. Many current file systems are
- bigger than that. <V11N37 Paul Eggert> <V11N39 Henry
- Spencer>
-
- Non-ASCII cpio format is clearly not portable and should not
- even be considered for standardization. <V11N12 Doug Gwyn>
-
- There are several problems that occur even with the ASCII
- cpio format:
-
- 1. Many implementations of cpio only look at the lower 16
- (or even 15) bits of the inode number, even in ASCII
- format. <V11N39 Henry Spencer> This is because the
- variable that is used to contain the value is declared
- to be unsigned short, just as in binary format. Thus,
- even though ASCII cpio format does not constrain this
- number, it is still less than portable. <V11N37 Paul
- Eggert>
-
- 2. The proposed cpio ASCII format as specified, <N.048
- Lorraine C. Kevra> <V11N14> is not portable because
- the proposal assumes that sizeof(int) == sizeof(long).
- <N.064 Dominic Dunlop> <V11N15>
-
- 3. The file type written in a numerical format, making it
- UNIX specific rather than POSIX specific, since POSIX
- (and tar) specifies symbolic, rather than numerical,
- values for file types. <V11N09 Guy Harris>
-
- 4. Hard links are not handled well, since cpio format
- does not record that two files are linked. If two
- files that are linked are written in cpio format, two
- copies will be written. There is an option to the
- cpio program to detect duplicate files by matching
- pairs of (h_dev, h_ino) and producing links, but that
- is done after the fact. <V11N09 Guy Harris> (There is
- a program, afio, that handles cpio format more
- efficiently in this and other cases than the licensed
- versions of the program.) <V11N21 Chuck Forsberg>
-
- 5. Symbolic links are not handled at all, and no type
- value is reserved for them. This makes cpio useless
- on a large class of historical implementations (those
- based on 4.2BSD or its file system) for one of the
- main purposes of POSIX sS10.1: archiving files for
- later retrieval and use on the same system.
-
-
-
-
-
-
-
- Page 3 tar vs. cpio IEEE P1003.1 N.___
-
-
-
- 6. The cpio format is less common than tar format: there
- are few historical implementations from Version 7 on
- that do not have tar; there are many that do not have
- cpio. <V11N09 Guy Harris> <V11N10 Charles Hedrick>
- <V11N24 Jim Cottrell> It is true that cpio (non-ASCII
- format) was invented before tar, <V11N22 Joseph S. D.
- Yao> apparently in PWB System 1.0. <V11N26 Joseph S.
- D. Yao> However, cpio was not available outside AT&T
- before the release of System III, while tar was in
- wide use with Version 7 and is still much more common.
- Also, it appears that the cpio format of PWB was not
- the same as that of System III. <V11N39 Henry
- Spencer> Although System III and perhaps early
- releases of System V did not include tar, <V11N26
- Joseph S. D. Yao> current releases of System V do.
-
- 7. It is very late in the process to propose that P1003.1
- adopt cpio format now, especially considering that it
- was originally proposed to and rejected by the
- /usr/group committee before P1003.1 was even formed.
- <V11N39 Henry Spencer>
-
- There are several advantages to the current tar-based format
- as specified in sS10.1:
-
- 1. There are no byte- or word-swapping issues caused by
- the format, since all the header values are ASCII byte
- streams. <V11N17 John Gilmore>
-
- 2. There are no inode numbers recorded, and file types
- are kept in symbolic form, so the format is less
- implementation-specific than cpio format. <V11N17
- John Gilmore>
-
- 3. Historical tar format is the most widely used, as
- discussed in 6. above, despite apparent assertions to
- the contrary. <N.043 X/OPEN> <V11N13>
-
- 4. The format specified in sS10.1 is upward-compatible
- with tar format. Old tar archives can be extracted by
- a program that implements sS10.1. Archives using some
- of the extensions of sS10.1 can be extracted with old
- (Version 7) tar programs, although symbolic links will
- not be extracted and contiguous files will not be
- handled properly (cpio does not handle these
- capabilities at all). Files with very long names will
- not be handled properly (cpio does no better at this).
- All tar implementations are compatible to this extent.
- <V11N17 John Gilmore>
-
-
-
-
-
-
-
-
-
- Page 4 tar vs. cpio IEEE P1003.1 N.___
-
-
-
- 5. The /usr/group working group and P1003.1 have already
- done the work <P.061> <M.019 5.1.121 Pg.13> <RFC.003
- #121> <P.038> <P.006> required to add optional
- extensions (such as symbolic links, contiguous files,
- and long file names) that are needed on many
- historical implementations and that cpio format lacks.
-
- 6. The format is extensible for future facilities.
- <V11N39 Henry Spencer>
-
- 7. There is a public domain implementation of the format
- of sS10.1. That implementation provided feedback which
- led to improvements in the current specification, and
- has been in use for years in transferring data with
- licensed tar implementations. <V11N17 John Gilmore>
-
- 8. Many people prefer the user interface of the cpio
- program to that of the tar program, because the former
- can accept a list of pathnames to archive on standard
- input while the latter takes them as arguments,
- limiting the length of the list. <V11N34 Andrew
- Tannenbaum> However, the above-mentioned public domain
- implementation of tar accepts pathnames on standard
- input. <V11N17 John Gilmore> <V11N19 Jim Cottrell>
- Diffs to standard tar to add an option to accept
- pathnames on standard input when creating an archive
- have also been posted to USENET. <V11N36 John
- Gilmore> The user interface is, in any case,
- irrelevant to P1003.1. <V11N39 Henry Spencer> <V11N40
- Rahul Dhesi>
-
- There are some problems that neither tar nor cpio handles
- well.
-
- 1. An option to prevent crossing mount points would be
- useful for backups. <V11N19 Jim Cottrell> <V11N22
- Joseph S. D. Yao> However, this appears to be more of
- an implementation issue than a format issue, <V11N28
- Dave Brower> <V11N32 Joseph S. D. Yao> especially
- considering that there are options to find in 4.2BSD,
- <V11N24 Jim Cottrell> SunOS 3.2, <V11N36 John Gilmore>
- and System V Release 3.0 <V11N35 Mike Akre> that take
- care of this.
-
- 2. The default block size in many tar implementations is
- too large for some tape controllers to read <V11N27
- Rob Lake> (the 3B20 has this problem). This is not a
- problem with the interchange format, however.
-
- There is nothing that the proposed cpio can handle that the
- tar-based format already in POSIX sS10.1 cannot handle; in
-
-
-
-
-
-
-
- Page 5 tar vs. cpio IEEE P1003.1 N.___
-
-
-
- fact, the former is less capable. If cpio format were
- augmented to handle missing capabilities, it would be
- subject to the same objections now aimed at the format given
- in sS10.1: that it was not identical with an existing format.
-
- There is no advantage in replacing the current tar-based
- format of sS10.1 with cpio format. There is also no
- advantage in adding cpio format, because two standards are
- not as good as a single standard.
-
- Some have recommended removing sS10.1 from POSIX altogether,
- <V11N12 Doug Gwyn> perhaps with a recommendation for P1003.2
- to pick up the idea. <V11N09 Guy Harris> While I believe
- that that would be preferable to adding cpio format, whether
- or not tar format remains, I recommend leaving sS10.1 as it
- is, because
-
- o+ The inclusion of an archive/interchange file format is
- in agreement with the purpose of POSIX to promote
- portability of application programs across interface
- implementations. Some format will be used. It is to
- the advantage of the users of the standard for there to
- be a standard format.
-
- o+ The de facto standard is tar format. The current sS10.1
- standardizes that, and provides upward-compatible
- extensions in areas that were previously lacking.
-
- The Archive/Interchange File Format should be left as it is.
-
- Thank you,
-
-
-
- John S. Quarterman
-
-
-
-
-
-
- Volume-Number: Volume 11, Number 41
-
-