home *** CD-ROM | disk | FTP | other *** search
- Yesterday was 16 June, which was the day I said I would collect
- tar and cpio comments until. Included below is the revised note
- for P1003.1, incorporating those comments. I will deliver it
- to P1003.1 in Seattle Monday.
-
-
-
- tar vs. cpio IEEE P1003.1 N.___
- 17 June 1987
-
- John S. Quarterman
-
- Institutional Representative from USENIX
- usenix!jsq
-
-
-
- Secretary, IEEE Standards Board
- Attention: P1003 Working Group
- 345 East 47th St.
- New York, NY 10017
-
- In both the Trial Use Standard and the current Draft 10,
- POSIX sS10.1 describes a data interchange format based on the
- tar program. That section has appeared in every draft of
- IEEE 1003.1 in some form and has always been based on tar
- format. The P1003.1 Working Group has recently received two
- related proposals regarding that section: one to add cpio
- format (including old-style, non-ASCII (non c option)
- format); <N.048 Lorraine C. Kevra> <V11N14> <V11N25 Eric S.
- Raymond> the other to replace the existing tar-based format
- with cpio format. <N.043 X/OPEN> <V11N13> Some
- clarifications were received to the former. <N.064 Dominic
- Dunlop> <V11N15> It was also proposed verbally in the latest
- Working Group meeting to drop sS10.1 altogether and let
- P1003.2 handle the issue. <V11N08> <V11N11> <V11N09 Guy
- Harris> <V11N12 Doug Gwyn>
-
- The present note is a response to those proposals. Much of
- the detail in it is derived from articles posted in the
- USENET newsgroup comp.std.unix. Those articles are
- referenced with this format: <V11N09 Guy Harris> which gives
- the volume (always 11) and number of the article, and the
- name of the submittor. If no submittor name is given, the
- posting was by the moderator, John S. Quarterman. Thanks to
- those who submitted articles. However, the content of this
- note is solely the responsibility of the author.
-
- This note is addressed to P1003.1, and is concerned with
- data interchange formats. Although user interface issues
- may be of interest to P1003.2, they are not addressed here.
-
- There are a number of problems with both cpio formats.
- First, those related to the non-ASCII format:
-
- 1. Numerous parameters, including inode numbers, mode
- bits, and user and group IDs, are kept in two-byte
- binary integers. This has historically produced
- serious byte-order problems when data is moved among
- systems with different byte orders. <V11N09 Guy
- Harris>
-
-
-
-
-
-
-
-
- Page 2 tar vs. cpio IEEE P1003.1 N.___
-
-
-
- 2. The byte-swapping and word-swapping options to the
- cpio program are inadequate patches; with an ASCII
- format the problem would not be present. The options
- are not consistent across versions of the program: in
- System III, data blocks and file names are byte
- swapped; in System V, only data blocks are byte
- swapped. <V11N09 Guy Harris> <V11N47 Andrew
- Tannenbaum>
-
- 3. The two-byte integer format limits the range of inode
- numbers to 0..65535. Many current file systems are
- bigger than that. <V11N37 Paul Eggert> <V11N39 Henry
- Spencer>
-
- Non-ASCII cpio format is clearly not portable and should not
- even be considered for standardization. <V11N12 Doug Gwyn>
-
- There are several problems that occur even with the ASCII
- cpio format:
-
- 1. Many implementations of cpio only look at the lower 16
- (or even 15) bits of the inode number, even in ASCII
- format. <V11N39 Henry Spencer> This is because the
- variable that is used to contain the value is declared
- to be unsigned short, just as in binary format. Thus,
- even though ASCII cpio format only constrains this
- number to the range 0..262143, the format is still
- less than portable. <V11N37 Paul Eggert>
-
- 2. The proposed cpio ASCII format as specified, <N.048
- Lorraine C. Kevra> <V11N14> is not portable because
- the proposal assumes that sizeof(int) == sizeof(long).
- <N.064 Dominic Dunlop> <V11N15>
-
- 3. The file type is written in a numerical format, making
- it UNIX specific rather than POSIX specific, since
- POSIX (and tar) specifies symbolic, rather than
- numerical, values for file types. <V11N09 Guy Harris>
-
- 4. Hard links are not handled well, since cpio format
- does not directly record that two files are linked.
- If two files that are linked are written in cpio
- format, two copies will be written. The cpio program
- detects duplicate files by matching pairs of (h_dev,
- h_ino) and producing links, but that is done after the
- fact. <V11N09 Guy Harris> <V11N45 Guy Harris> <V11N54
- Ian Donaldson> (There is a program, afio, that handles
- cpio format more efficiently in this and other cases
- than the licensed versions of the program.) <V11N21
- Chuck Forsberg>
-
-
-
-
-
-
-
-
- Page 3 tar vs. cpio IEEE P1003.1 N.___
-
-
-
- 5. Symbolic links are not handled at all, and no type
- value is reserved for them. This makes cpio useless
- on a large class of historical implementations (those
- based on 4.2BSD or its file system) for one of the
- main purposes of POSIX sS10.1: archiving files for
- later retrieval and use on the same system. Although
- it is possible to extend cpio to handle symbolic
- links, and at least one vendor has done this, <V11N45
- Guy Harris> the format proposed to P1003.1 is the
- format in the SVID, and does not handle symbolic
- links.
-
- 6. The cpio format is less common than tar format: there
- are few historical implementations from Version 7 on
- that do not have tar; there are many that do not have
- cpio. <V11N09 Guy Harris> <V11N10 Charles Hedrick>
- <V11N24 Jim Cottrell> It is true that cpio (non-ASCII
- format) was invented before tar, <V11N22 Joseph S. D.
- Yao> apparently in PWB System 1.0. <V11N26 Joseph S.
- D. Yao> The cpio program was first available outside
- AT&T with PWB/UNIX 1.0, <V11N45 Guy Harris> <V11N63
- Joseph S. D. Yao> and later with System III. However,
- in the interim, Version 7, which did not include cpio
- <V11N53 Bill Jones> <V11N62 Guy Harris> but did
- include tar, became the most influential system.
- There was a V7 addendum tape, but it also did not
- include cpio (according to its README file); <V11N65
- Rick Adams> the addendum tape was in tar format.
- Also, it appears that the cpio format of PWB was not
- the same as that of System III. <V11N39 Henry
- Spencer> And System III and all releases of System V
- include tar. <V11N26 Joseph S. D. Yao> <V11N63 Joseph
- S. D. Yao> <V11N45 Guy Harris> <V11N47 Andrew
- Tannenbaum>
-
- 7. It is very late in the process to propose that P1003.1
- adopt cpio format now, especially considering that it
- was originally proposed to and rejected by the
- /usr/group committee before P1003.1 was even formed.
- <V11N39 Henry Spencer>
-
- Advantages of cpio format include:
-
- 1. Both X/OPEN <N.043 X/OPEN> <V11N13> and the SVID
- <N.048 Lorraine C. Kevra> <V11N14> use it, although
- evidently defined somewhat differently. <N.064
- Dominic Dunlop> <V11N15>
-
- 2. Archives made in cpio format are often smaller than
- ones in tar format. <V11N44 Mark Horton> But this is
- only because of the headers, and thus the effect
-
-
-
-
-
-
-
- Page 4 tar vs. cpio IEEE P1003.1 N.___
-
-
-
- diminishes with larger files.
-
- 3. On a local (non-networked) system, cpio is more
- efficient at copying directory trees than tar.
- <V11N46 Steve Blasingame> However, this is really an
- implementation issue.
-
- There are several advantages to the current tar-based format
- as specified in sS10.1:
-
- 1. There are no byte- or word-swapping issues caused by
- the format, since all the header values are ASCII byte
- streams. <V11N17 John Gilmore>
-
- 2. There are no inode numbers recorded, and file types
- are kept in symbolic form, so the format is less
- implementation-specific than cpio format. <V11N17
- John Gilmore>
-
- 3. Historical tar format is the most widely used, as
- discussed in 6. above, despite apparent assertions to
- the contrary. <N.043 X/OPEN> <V11N13>
-
- 4. The format specified in sS10.1 is upward-compatible
- with tar format. Old tar archives can be extracted by
- a program that implements sS10.1. Archives using some
- of the extensions of sS10.1 can be extracted with old
- (Version 7) tar programs, although symbolic links will
- not be extracted and contiguous files will not be
- handled properly (cpio does not handle these
- capabilities at all). Files with very long names will
- not be handled properly (cpio does no better at this).
- All tar implementations are compatible to this extent.
- <V11N17 John Gilmore>
-
- 5. The /usr/group working group and P1003.1 have already
- done the work <P.061> <M.019 5.1.121 Pg.13> <RFC.003
- #121> <P.038> <P.006> required to add optional
- extensions (such as symbolic links, long file names,
- <V11N49 Jerry Schwarz> <V11N50 Michael MacDonald> and
- contiguous files) that are needed on many historical
- implementations and that cpio format lacks.
-
- 6. The format is extensible for future facilities.
- <V11N39 Henry Spencer>
-
- 7. There is a public domain implementation of the format
- of sS10.1. That implementation provided feedback which
- led to improvements in the current specification, and
- has been in use for years in transferring data with
- licensed tar implementations. <V11N17 John Gilmore>
-
-
-
-
-
-
-
- Page 5 tar vs. cpio IEEE P1003.1 N.___
-
-
-
- 8. Many people prefer the user interface of the cpio
- program to that of the tar program, because the former
- can accept a list of pathnames to archive on standard
- input while the latter takes them as arguments,
- limiting the length of the list. <V11N34 Andrew
- Tannenbaum> However, the above-mentioned public domain
- implementation of tar accepts pathnames on standard
- input, <V11N17 John Gilmore> <V11N19 Jim Cottrell> and
- at least one vendor sells a version of tar that can do
- this. <V11N48 Michael Gersten> Diffs to standard tar
- to add an option to accept pathnames on standard input
- when creating an archive have also been posted to
- USENET. <V11N36 John Gilmore> The user interface is,
- in any case, irrelevant to P1003.1. <V11N39 Henry
- Spencer> <V11N40 Rahul Dhesi>
-
- Disadvantages of tar format:
-
- 1. If an attempt is made to extract only the second of a
- pair of hard linked files the tar program will attempt
- to link the second file to the nonexistent first file,
- and nothing will be extracted. Although a
- sufficiently clever implementation could avoid this,
- the problem can be considered to be in the archive
- format. <V11N66 Kenneth Almquist>
-
- There are some problems that neither tar nor cpio handles
- well.
-
- 1. File names still longer than the length of PATH_MAX
- (at least 255) <V11N50 Michael MacDonald> that the
- POSIX format allows (and than the 128 that cpio
- permits or than the 100 that historical tar allows)
- would be preferable, although the POSIX limit is
- useful for most cases. <V11N54 Ian Donaldson>
-
- 2. An option to prevent crossing mount points would be
- useful for backups. <V11N19 Jim Cottrell> <V11N22
- Joseph S. D. Yao> However, this appears to be more of
- an implementation issue than a format issue, <V11N28
- Dave Brower> <V11N32 Joseph S. D. Yao> especially
- considering that there are options to find in 4.2BSD,
- <V11N24 Jim Cottrell> SunOS 3.2, <V11N36 John Gilmore>
- and System V Release 3.0 <V11N35 Mike Akre> that take
- care of this.
-
- 3. The default block size in many tar implementations is
- too large for some tape controllers to read <V11N27
- Rob Lake> (the 3B20 has this problem). This is not a
- problem with the interchange format, however.
-
-
-
-
-
-
-
-
- Page 6 tar vs. cpio IEEE P1003.1 N.___
-
-
-
- There is nothing that the proposed cpio can handle that the
- tar-based format already in POSIX sS10.1 cannot handle; in
- fact, the former is less capable. If cpio format were
- augmented to handle missing capabilities, it would be
- subject to the same objections now aimed at the format given
- in sS10.1: that it was not identical with an existing format.
-
- There is no advantage in replacing the current tar-based
- format of sS10.1 with cpio format. There is also no
- advantage in adding cpio format, because two standards are
- not as good as a single standard.
-
- Some have recommended removing sS10.1 from POSIX altogether,
- <V11N12 Doug Gwyn> perhaps with a recommendation for P1003.2
- to pick up the idea. <V11N09 Guy Harris> While I believe
- that that would be preferable to adding cpio format, whether
- or not tar format remains, I recommend leaving sS10.1 as it
- is, because
-
- o+ The inclusion of an archive/interchange file format is
- in agreement with the purpose of POSIX to promote
- portability of application programs across interface
- implementations. Some format will be used. It is to
- the advantage of the users of the standard for there to
- be a standard format.
-
- o+ The de facto standard is tar format. The current sS10.1
- standardizes that, and provides upward-compatible
- extensions in areas that were previously lacking.
-
- The Archive/Interchange File Format should be left as it is.
-
- Thank you,
-
-
-
- John S. Quarterman
-
-
- Volume-Number: Volume 11, Number 67
-
-