home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.arch:11989 comp.sys.dec:6642 comp.sys.sgi:18373 comp.sys.hp:14415
- Path: sparky!uunet!spool.mu.edu!agate!dog.ee.lbl.gov!horse.ee.lbl.gov!torek
- From: torek@horse.ee.lbl.gov (Chris Torek)
- Newsgroups: comp.arch,comp.sys.dec,comp.sys.sgi,comp.sys.hp
- Subject: Re: Comparison of Alpha, MIPS and PA-RISC-II wanted
- Date: 30 Dec 1992 00:01:31 GMT
- Organization: Lawrence Berkeley Laboratory, Berkeley CA
- Lines: 136
- Message-ID: <28164@dog.ee.lbl.gov>
- References: <1992Dec29.044012.1@cc.curtin.edu.au> <3623363@zl2tnm.gen.nz>
- NNTP-Posting-Host: 128.3.112.15
-
- (Note, I have no idea what any of this is doing in any of these
- newsgroups, but....)
-
- [someone---the names have been deleted to protect the guilty :-) ---writes]
- >>But U*** still loses because it has no way of telling the application what
- >>TYPE of file structure it is...
-
- [someone else writes]
- >Exactly. Well, you can pull kludgery with magic numbers....
-
- OK, I can see it is time to go back to basics here. (Not BASIC, basics :-) )
-
- The first principle is that there must be *some* shared ideas in order
- to make communication possible. If I were to write this article in
- Swahili, you could read it if and only if: a) you knew it was in
- Swahili; and b) you knew Swahili, or had a translation system. Without
- both of those, no communication would occur. This article, like too
- many others, would all be noise.
-
- This sort of thing---links via shared ideas that allow one to
- understand part of a message, which then allows still further
- understanding, and so on---is just what happened with the Rosetta
- Stone. Pre-Rosetta, trying to read Dead Sea Scrolls was something like
- trying to read a structured file without any idea what the structure
- was. (I may have some details confused; I am not into archaeology.
- But the idea is valid: you need to have some key to get started.)
-
- Now let us consider computer files.
-
- A computer data store simply holds a sequence of bits. Those bits,
- those 1s and 0s, have no inherent *meaning*---all the meaning arises
- through interpretation. We need some way to interpret each bit.
- This interpretation occurs at many levels: some of the bits on a
- disk, for instance, are for error correction. Others mark sector
- headers; still others (`gap' bits) are left simply for timing. The
- remainder are what we usually think of as `user and system data'.
-
- (Incidentally, note that I have started several levels up. The disk
- does not actually contain *bits* but rather merely records magnetic
- domain changes. When moved across a conductor, these generate a
- sequence of electrical signals which must be carefully interpreted in
- order to derive the individual bits.)
-
- In our case, these `data bits' are grouped in sets of 8. This was not
- always true, and there are probably a number of systems that still
- group them differently, but most modern machines use `bytes' as a basic
- building block. Each set of 8 again has an arbitrary interpretation:
- 01101001 may mean `the number 105' (a `big-endian' integral
- interpretation) or `the number 150' (a `little-endian' integeral
- interpration) or perhaps `the letter lowercase i' (an ASCII
- interpretation of code 105). They could be part of a larger integer,
- or maybe they are even part of a floating point number or something
- equally bizarre.
-
- Fortunately, most of use use computers with a few standard
- interpretations. If we agree, for purposes of netnews, to write the
- bits in a big-endian fashion, we can talk about 8, 16, 32, and 64 bit
- integers---these are available on practically every modern system---and
- 32, 64, 80, and often 128-bit floating point numbers in either IEEE or
- some proprietary notation. Textual strings can be represented in
- ASCII, either with a leading count, or more typically just by enclosing
- in special `quote' characters. More complicated interpretations, such
- as Lisp-like S-expression structures or relational database tables, we
- generally represent using some ad-hoc notation and hope that our
- readers apply the proper interpretation from that.
-
- Of course, our computer systems are generally not as flexible as our
- people, and our people have far more experience with various forms of
- communication than do our machines. So when we use a computer we
- generally have to instruct it, in gory detail, as to just how we want
- those 1s and 0s interpreted. And here we reach the point where the
- flamage occurs. Some OSes provide an enormous collection of `standard
- interpretations', and some provide only a few. The VMS people insist
- that it is better to have 16 built-in record formats; the UNIX people
- insist that it is better to have none.
-
- Both are right, both are wrong; neither way is `better'.
-
- (By the way, the `16' above was selected at random. I have no idea
- how many formats RMS officially supports, and if you count each record
- length as a different format you will get a different number than if
- you count all `fixed block' variants as one format. It does little
- good to set an exact number. The fact is that RMS provides a number
- of standard formats---fixed and variable block, stream-LF, etc.---and
- the usual UNIX interface is just `a stream of bytes'.)
-
- Let us suppose you want to, say, keep an inventory. Under VMS, you can
- choose a fixed block format with a particular record size, and store a
- sequence of records: manufacturer, model, serial number, quantity on
- hand, stock level, prices, date last ordered, date last sold, etc. On
- a UNIX system you have to package up the records yourself. But this is
- the easy part! Managing a sequence of records is trivial. The hard
- work goes into deciding what data to keep, how to cross-index, what
- operations to support, what kind of security to provide, and so forth,
- and into developing tools for answering questions about the data
- (`should we drop the Baby Spits Up doll?'). As far as I know, no OS
- solves these problems directly (yet?). One can buy canned solutions
- for both VMS and UNIX systems, of course, but in that case the
- underlying OS implementation is irrelevant; one can hardly argue that
- having or lacking RMS is inherently superior just because FooBase sells
- a system for VMS or UNIX (or both or neither).
-
- Now, one can say that RMS is provided but need not be used. This is a
- valid point. But the argument cuts both ways: features provided may be
- used inappropriately. Indeed, for some purposes a largely-unstructured
- text file may prove better than a highly-structured collection of
- records. Grep's regular expressions are quite simple and yet act as
- a powerful query tool (this reminds me of Kernighan's story of using
- egrep to find words one can spell on a calculator, but never mind that).
- Fortunately or not, grep does not work on records. If one had records,
- grep would need some way of `flattening' them for display anyway.
-
- And then there are `magic numbers'. Are they a kludge? Well, no and
- yes. They are an ad-hoc method to insert a `representation key' in a
- file. Given a simple `stream of bytes' file, if one takes the first
- two or four bytes of a file and puts an `unlikely' value there, one can
- write programs that find these `magic numbers' and change their
- behavior based upon them. One might further wonder how exactly this
- differs from reading a few bytes from a specially-defined location in
- every file, and deciding the file's type based on that information.
- Indeed, there *is* *no* difference except in convention: RMS stores
- file types in a system-decreed manner, and RMS is included in
- everything used in the system, so that, by convention, everything makes
- the same interpretation of the magic number. UNIX systems are, in this
- sense, looser: each individual program can attempt to interpret file
- types or not, in an individual manner. A rigid system avoids chaos,
- but also stifles novel solutions.
-
- This is why, when we speak in generalities, it does no good to say that
- either VMS or UNIX is `better'. I know which one *I* prefer, but then
- I know the kinds of tasks I do and how I go about them. I also know
- whether I prefer raspberry ice cream over peach, but I would not claim
- one is `better' and the other `loses'. :-)
- --
- In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 510 486 5427)
- Berkeley, CA Domain: torek@ee.lbl.gov
-