home *** CD-ROM | disk | FTP | other *** search
- Date: Wed, 11 Dec 85 00:32:33 PST
- From: l5!gnu@LLL-CRG.ARPA (John Gilmore)
-
- Section 10.1.1 introduces the terms "interpret" and "translate" for
- "load" and "dump". Can we just use the familiar terms? I have trouble
- remembering which is which.
-
- [ I think using either of the words "dump" or "restore" (the latter
- actually used in the section) is a mistake, since they also connote
- a completely different set of programs than those usually associated
- with the format in question. -mod ]
-
- This section also says:
- > "The format-interpreting utility is defined such that if it is not a
- > privileged program, when data is read into the system from the
- > transportable media, all protection information is ignored. Instead the
- > user ownership and group owership are set to that of the process context
- > which is running the utility. All access protection information can be
- > set to be no more liberal than that of the process that is running the
- > utility. A privileged version of the utility must have as a minimum, an
- > option that obeys the protection information stored on the transportable
- > media, such that this format and the corresponding utility can be used as
- > a save/restore mechanism."
-
- First, this is self-contradictory; it says all protection information
- is ignored, then says it can be set "no more liberal" than the
- process.
-
- [ The utility is not prevented from reading anything in the data
- because of any protections associated with it. However, once the
- utility converts the data into files in the file system, there *is*
- protection associated with it, as with any file: the utility must
- set the appropriate protection bits. -mod ]
-
- (I would assume the OS takes care of not letting your process
- set protection more liberal than its own, else there is no security.)
- I think what it means is that it's not legal for system V "tar" to always
- chown away the files, which you can't get back.
-
- [ Cpio actually does that under System V. Such chowning is a major
- security problem, much like your phrase: "there is no security",
- since the numeric user ids on the "tape" may have completely different
- meanings on the system where it is being read than on the one where
- it was written. This problem has been addressed in several other places
- in the standard as well as here. -mod ]
-
- Was there some other reason for this paragraph? If not, can we replace
- the text with something like:
-
- "The format-loading utility must not set access protections that cannot
- be revoked by the user running the utility (whether the user is
- privileged or not). If it can be run as a privileged utility, an
- option (or default behaviour) must exist which obeys all the loaded
- protection information, so it can be used for system backups."
-
- ---
-
- Also, section 10.1.2 uses confusing terminology with regard to blocks
- and records. In the data processing world, a block is a big thing and
- one or more records fit in it (roughly speaking). Like you write 100
- records 80 chars long in an 8000 byte block on tape. Has anybody
- checked the ANSI standard for tape format to see what they call 'em?
- The Unix standard uses "block" for the small records, "group" for the
- large things, and also mentions that a "group" might turn into a single
- tape "record".
-
- I also don't see the need for two records of zeros on the end. One
- should be fine, and it won't break compatability with the Unix tar
- program, which quits as soon as it sees the first one. Tar should
- really use EOF rather than this funny end of tape record; this would
- solve two or three minor problems with it, but would break
- compatability with existing Unix "tar". (The problems: the tape is
- positioned wrong after reading a tar archive from a multi-file tape,
- since the tape mark has not yet been read; you can't just concatenate
- tar archives to combine their contents (which would make multi-volume
- tar handling somewhat easier too); extra data is written, which
- makes it uneconomical to use a large, tape-efficient block size (like a
- megabyte on streaming cartridge tapes, since this will waste up to a
- megabyte of space on the tape).
-
- What I suggest is that ANSI standard tar's should be required to work
- OK when reading an archive terminated by EOF (short last block, then
- zero length result from read()). Suggested wording:
-
- An archive tape or file contains a series of records. Each record is of
- size TRECORDSIZE (see below). Although this format may be thought of as
- being on magnetic tape, this does not exclude the use of other
- media. Each file archived is represented by a header record
- which describes the file, followed by zero or more records which give the
- contents of the file. At the end of the archive file there may be a record
- filled with binary zeros as an end-of-file indicator. A conforming
- system must write a record of zeros at the end, but must not assume that
- an end-of-file record exists when reading an archive.
-
- The records may be blocked for physical I/O operations. Each block of
- n records (where n is set by the application program creating the
- archive file) may be written with a single write() operation. On
- magnetic tapes, the result of such a write is a single tape record.
- When writing an archive, the last block of records shall be written
- at the full size, with records after the zero record containing
- undefined data. When reading an archive, a confirming system shall
- properly handle an archive whose last block is shorter than the rest.
-
- This allows a system to provide an option to write more modern
- archives, which will be readable by all P1003 conforming systems, but
- requires that the default be compatible (readable with V7 Unix 'tar').
-
- ---
-
- > /* Values used in typeflag field */
- > #define REGTYPE '0' /* Regular file */
- > #define AREGTYPE '\0' /* Regular file */
- > #define LNKTYPE '1' /* Link */
- > #define SYMTYPE '2' /* Reserved */
- > #define CHRTYPE '3' /* Char. special */
- > #define BLKTYPE '4' /* Block special */
- > #define DIRTYPE '5' /* Directory */
- > #define FIFOTYPE '6' /* FIFO special */
- > #define CONTTYPE '7' /* Reserved */
-
- In the header file, less generic names than e.g. "REGTYPE" should be used.
- How about "TF_REGULAR" (typeflag = regular file). This avoids the well
- known problem that a #define is a joy (or a pain) forever, especially
- when some other header file wants to use the same name:
-
- /* The typeflag defines the type of file */
- #define TF_OLDNORMAL '\0' /* Normal disk file, compat */
- #define TF_NORMAL '0' /* Normal disk file */
- #define TF_LINK '1' /* Link to dumped file */
- #define TF_SYMLINK '2' /* Symbolic link */
- #define TF_CHR '3' /* Character special file */
- #define TF_BLK '4' /* Block special file */
- #define TF_DIR '5' /* Directory */
- #define TF_FIFO '6' /* FIFO special file */
- #define TF_CONTIG '7' /* Contiguous file */
- /*
- * All other type values except A-Z are reserved for future standardization
- * and may not be used. A-Z may be used for implementation-dependent
- * record types.
- */
-
- The mode fields should use a prefix like "TM_" rather than just "T".
- Also, TSVTX (the sticky bit) cannot be "reserved" otherwise implementations
- cannot write archives that have it turned on. Call it implementation-defined,
- if you must.
-
- > All characters are represented in ASCII, using 8-bit characters without
- > parity. Each field within the structure is contiguous; that is, there is
- > no padding used within the structure. Each character on the archive media
- > is stored contiguously.
-
- You'd better be more specific. USASCII, with the 7-bit character in the
- low-order 7 bits and the high-order bit cleared? What about foreign
- sites with funny characters in their file names?
-
- > The fields name, linkname, magic, uname and gname are null-terminated
- > character strings.
-
- Does this mean that when writing an archive, you MUST put in the null,
- or if the value exactly fills the field, is it OK to not have a null
- there? In other words, caveat writer or caveat reader? Here again, a
- prudent course would be to require the writer to do it right, and
- require the reader to accept it either way.
-
- > The mtime field is the modification time of the file at the time it was
- > archived. It is the ASCII representation of the octal value of the
- > modification time obtained from the stat() call.
-
- This should be spelled out in detail, so the definition of the archive
- format can stand alone.
-
- > ASCII digit `2' is reserved.
- > ASCII digit `7' is reserved.
- > ASCII letters `A' through `Z' are reserved for custom implementations.
- > All other values are reserved for specification in future revisions of the
- > standard.
-
- As I understand standards, something that is reserved canNOT be used by
- an implementation to extend the standard. This is not the intention
- here, since I presume compatability with BSD systems (which use 2 for
- symlinks) is desired. I'm not sure why we don't just standardize
- symlinks here; after all, not all systems have fifos or contiguous
- files either...
-
- [ They were in there at one point. I wonder what happened to them. -mod ]
-
- > The encoding of the header is designed to be portable across machines.
-
- This sentence can go...
-
- > 10.1.3 Notes
- > ...
- > Implementors should be aware that the previous file format did not include
- > a mechanism to archive directory type files. For this reason, the
- > convention of using a file name which ended with a slash (/) was adopted
- > to specify the archiving of a directory.
-
- But ANSI standard systems are not required to read such a tape? I think
- they should be required to read it but not write it.
-
-
- An additional point. The standard does not specify what fields are defined
- in what record types. For example, is it OK to have garbage in the linkname
- in record type 0 (normal files)? Is it OK to put zeros in the uid/gid fields
- if you have filled in the uname/gname/magic fields (say your system does not
- have numeric uids?). What about the bytes in the header records that
- are not defined by the structure? Or the bytes beyond the end of a file,
- in its last record? I'd suggest that we require these fields to be nulls
- on writing, and require them to be ignored on reading, again for prudence.
-
- Volume-Number: Volume 4, Number 9
-
-