home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Project Gutenberg 1994 April
/
Project_Gutenberg_CD-ROM_Walnut_Creek_April_1994.iso
/
mac
/
history.gut
< prev
next >
Wrap
Text File
|
1994-03-01
|
26KB
|
499 lines
The History and Philosophy of Project Gutenberg (c)August 1992
Second edition prepared for August, 1992. Updated regularly.
(margins are 62, about 10 pages, send only the complete file.)
(Includes answers to many Frequently Asked Questions (FAQ))
There is a lot of information in this little file. . .and your
requested information may be contained in a short portion. It
is therefore recommended that you search for subjects. It was
not feasible to break this file into smaller ones, but we have
been told that our audience responds best to quick, short, and
concise responses. These are marked by subject headers and by
paragraphing. Read fast, it is all quite simple. If you find
something of great interest, you might want to read it again.
The purpose of this file is to answer questions. . .not create
flames. We have long ago learned that flamers must be allowed
to burn themselves out. However, we feel obliged to answer in
the forums in which the flames were posted. . .not to satisfy,
can't be done, the flamers, but to explain to the rest of that
audience what Project Gutenberg is and is not, however flamers
may have misstated the obvious. Etext is certainly one of the
most obvious uses of computers, and the flamers can hardly put
a dent in that fact. Plain Vanilla ASCII is also obviously an
important etext medium, but no one at Project Gutenberg states
that it is or should be the only etext medium.
"When you get something for free, you get what you pay for!!!"
That means if you don't use what you get for free, it won't do
you any good. But sometimes it is nice to have a library your
friends and family can use, even if they don't always use it.
The Beginning
Project Gutenberg began in 1971 when Michael Hart was given an
operator's account with $100,000,000 of computer time in it by
the operator's of the Xerox Sigma V mainframe at the Materials
Research Lab at the University of Illinois.
This was totally serendipitous, as it turned out that two of a
four operator crew happened to be the best friend of Michael's
and the best friend of his brother. Michael just happened "to
be at the right place at the right time" at the time there was
more computer time than people knew what to do with, and those
operators were encouraged to do whatever they wanted with that
fortune in "spare time" in the hopes they would learn more for
their job proficiency.
At any rate, Michael decided there was nothing he could do, in
the way of "normal computing," that would repay the huge value
of the computer time he had been given. . .so he had to create
$100,000,000 worth of value in some other manner. An hour and
47 minutes later, he announced that the greatest value created
by computers would not be computing, but would be the storage,
retrieval, and searching of what was stored in our libraries.
He then proceeded to type in the "Declaration of Independence"
and tried to send it to everyone on the networks. . .which can
only be described today as a not so narrow miss at creating an
early version of what was later called the "Internet Virus."
A friendly dissuasion from this yielded the first posting of a
document in electronic text, and Project Gutenberg was born as
Michael stated that he had "earned" the $100,000,000 because a
copy of the Declaration of Independence would eventually be an
electronic fixture in the computer libraries of 100,000,000 of
the computer users of the future.
The Beginning of the Project Gutenberg Philosophy
The premise on which Michael Hart based Project Gutenberg was:
anything that can be entered into a computer can be reproduced
indefinitely. . .what Michael termed "Replicator Technology"
The concept of Replicator Technology is simple; once a book or
any other item (including pictures, sounds, and even 3-D items
can be stored in a computer, then any number of copies can and
will be available. Everyone in the world, or even not in this
world (given satellite transmission) can have a copy of a book
that has been entered into a computer.
This philosophical premise has created several offshoots:
1. Electronic Texts (Etexts) created by Project Gutenberg are
to be made available in the simplest, easiest to use forms
available.
2. Suggestions to make them less readily available are not to
be treated lightly.
Therefore, Project Gutenberg Etexts are made available in what
has become known as "Plain Vanilla ASCII," meaning the low set
of the American Standard Code for Information Interchange: ie
the same kind of character you read on a normal printed page--
italics, underlines, and bolds have been capitalized.
*** Parenthetical discussion on bold, italics and underlines)
This next paragraph may be skipped if you wish; it was created
in response to severe flaming on several occasions. (In many
conversations with authors, and those who research the authors
whom we publish, we have determined that most selections of an
assortment of possible emphases were made by the editors, with
little or no consultation to the authors. Thus we have little
motivation to continue our previous efforts to determine a way
to present italics, bolds and underlines in any other way than
by capitalizing them. In our estimation, the authors are this
final authority, and they say they merely intend to emphasize,
not that they have a particular affinity for one form over the
others. Please remember, we only talked to many authors, most
of whom said they either had no affinity for particular method
selections for emphasis (i.e. they didn't really care how most
emphases were made. . .via italics, bold, or underline). This
does NOT mean we talked to ALL authors, or that ALL said this.
This disclaimer is to mollify the flames we constantly get for
this. One quite famous author and editor has said that we may
as well get rid of all the capitals and punctuation, if we are
not going to do italics, bold and underline. Actually when we
started Project Gutenberg, there was no case distinction, very
few punctuation marks, and it was not terribly easy to read an
original etext of the Declaration of Independence. We try for
readability by HUMANS in the first place, and by programs as a
secondary feature. We LIKE the idea that programs should read
our files easily, BUT NOT TO THE EXCLUSION OF HUMANS. Thus we
do not use intrusive forms of markup, either those that should
make it difficult for many humans to read, or those that would
make it impossible for programs to read and search. Please no
more flames or requests for markup. This is for others to do,
and they are welcome to use our etexts in the doing. Repeat:
Project Gutenberg Etexts are meant for the general population,
NOT for the top 1% of the population who argue about whether a
word was meant to be italicized or bolded or underlined. This
is especially true of older books, written and published under
the customs and practices of different times and places. This
must be considered. So must the fact that many or most of the
books we are going to do were not written in English, or in an
English that is from a different place and time than this 20th
Century American English most networkers tend to use. English
of that type is not the language hardly any of our etexts were
written in. The arguments about American versus English are a
non-sequitur (irrelevant) to most of our audience, and we must
not spend as much time working on those aspects of a book as a
whole new book would take us to do. The same is true of 99.9%
accuracy. We expect to have errors in our etexts. . .etext is
so easy to correct that people just send us notes with errors;
we save them, and when we have a dozen we put out a new etext.
This takes very little time: we are now on our 30th edition of
Alice in Wonderland. Where else are you going to get editions
improved on such a rapid basis. In fact, one of the arguments
we hear frequently is that the errors of various editions must
be preserved in the etext editions, or the etexts editions are
not "authoritative editions." Ladies and Gentlemen. . .I have
just fallen off the head of the pin I have been balancing on--
(philosophers used to argue [seriously] about how many angels,
presuming such things as angels, could stand on the head of an
ordinary pin [some said it was how many could DANCE on it]; at
any rate this is more than enough for 1992, and I don't intend
to address the questions in this section for another year.
(End of parenthetical discussion on emphasis. Back to. . . .)
(When you read the next line you will wish you had skipped)***
The reason for this is that 99% of the hardware and software a
person is likely to run into can read and search these files.
Any other system of etext storage is going to fall short of an
audience of 99%.
This does not mean there are not other valid mean of doing the
etext business. . .after all, over half the computers are DOS,
so one could address a wide audience by just doing DOS. Plain
Vanilla ASCII, however, addresses the audience with Apples and
Ataris all the way to the old homebrew Z80 computers, while an
audience of Mac, UNIX and mainframers is still included.
In this same vein, Project Gutenberg selects etexts targeted a
bit on the "bang for the buck" philosophy. . .we choose etexts
we hope extremely large portions of the audience will want and
use frequently. We are constantly asked to prepare etext from
out of print editions of esoteric materials, but this does not
provide for usage by the audience we have targeted, 99% of the
general public.
Also in the same vein, Project Gutenberg has avoided requests,
demands, and pressures to create "authoritative editions." We
do not write for the reader who cares whether a certain phrase
in Shakespeare has a ":" or a ";" between its clauses. We put
our sights on a goal to release etexts that are 99.9% accurate
in the eyes of the general reader. Given the preferences your
proofreaders have, and the general lack of reading ability the
public is currently reported to have, we probably exceed those
requirements by a significant amount. However, for the person
who wants an "authoritative edition" we will have to wait some
time until this becomes more feasible. We do, however, intend
to release many editions of Shakespeare and the other classics
for the comparative study on a scholarly level, before the end
of the year 2001, when we are scheduled to complete our 10,000
book Project Gutenberg Electronic Public Library.
Project Gutenberg hopes to be a part of massive celebrations a
100th Anniversary of Public Libraries deserves in 1995, and in
1997 hopes to found "The Public Domain Register," on the 100th
Anniversary of The U.S. Copyright Register.
We hope you will be part of it, too. You are all invited.
Footnote:
Our eventual goal is to provide Public Domain Etext editions a
short time after they enter the Public Domain. Of course, the
period before a copyrighted work entered the Public Domain was
extended from 28 years (with a 28 year extension available) to
50 years more than the life of the author, so this put a kink,
to put it mildly, into our plans. (The original copyright was
for 14 years, in the U.S.) Thus, a person could originally do
a reasonable prediction that anything under copyright would be
in the Public Domain while it could be used, under the new law
it is impossible to predict the length of a copyright, and the
likelihood of a new book entering the Public Domain during the
lifetime of the average reader is minimal. (Suppose you might
be 25 when you read a new book and the author is 50: wait the
average 25 years for the author to die (what a thought!*) Now
you have to wait another 50 years to have access to that book;
it doesn't matter when it was written (unless it is an old one
. . .before the period the law retroacted to). . .so you would
have to wait (on the average) until you were 100 years old. A
25-year-old under the original law would only have to wait for
14 years. . .until the age of 39. Quite a difference; between
the ages of 39 and 100. Not only that, but the copyright laws
would have to stay the same for all that time. . .something in
serious doubt, seeing how much they have changed in the recent
century.
This goal of presenting Public Domain Editions immediately has
a Public Domain Register as it predecessor. Before you expect
the availability of all Public Domain materials, we have to at
least come up with a way of listing what those titles are. If
you are interested, please let us know before 1997 so we might
be able to include your efforts in the Public Domain Register.
The Project Gutenberg Philosophy
The Project Gutenberg Philosophy is to make information, books
and other materials available to the general public in forms a
vast majority of the computers, programs and people can easily
read, use, quote, and search.
This has several ramifications:
1. The Project Gutenberg Etexts should cost so little that no
one will really care how much they cost. They should be a
general size that fits on the standard media of the time.
i.e. when we started, the files had to be very small as a
normal 300 page book too one meg of space, which no one in
1971 could be expected to have (in general). So doing the
U.S. Declaration of Independence (only 5K) seemed the best
place to start. This was followed by the Bill of Rights--
then the whole US Constitution, as space was getting large
(at least by the standards of 1973). Then came the Bible,
as individual books of the Bible were not that large, then
Shakespeare (a play at a time), and then into general work
in the areas of light and heavy literature and references.
The rate at which we have chosen to release etexts is that
rate which will allow the general public (and us!) to grow
without undue effort into the Electronic Public Libraries.
We can't rely on CD's, as only a small fraction of persons
interested in etexts have CD's. We think CD are great but
we can't have that as our primary means of measurement and
distribution. Our goal is for the average user to be able
to store our library inexpensively on standard media. The
current standards are magnetic, with 1.44 floppies and the
200 and some meg hard drives being sold on the average for
the average two or three thousand dollar computer. A 1.44
floppy costs about fifty cents these days, in quantity (50
or so is enough to get this price), so $25 is enough for a
person to get into very inexpensive storage. This is just
about $1 to store uncompressed one thousand page books and
the average book can be stored on one floppy.
We like to think we have planned well enough that the user
would always be able to keep our library at an inexpensive
price. 1.44 floppies are currently the most feasible, for
the wallet, at least, and hard drive prices are falling to
nearly the same price per meg level. Right now our etexts
will fit quite nicely into one partition on the systems in
the two to three thousand dollar range. By the end of the
year 2001, we predict that this will still be the case, in
terms of a much larger library, and much larger computers,
which should also be much faster. The 786 should be out a
year or two before that time. The default computer of ten
years ago had maybe one meg, a few years later it was five
and then ten, until now it is a couple hundred meg ($1798,
at most mail order and discount houses. . .our default was
the "Best Buy" discount house which currently sells:
(And we do NOT recommend Best Buy or their brands)
486SX/25, 170M drive, 4MRAM, 8K cache, 2400 modem, two
floppies, SVGA, 24 pin printer, mouse, Windows 3.1 etc
These systems are not the best hardware in the world but a
system can be returned. Everything is already on the hard
drive, and all you have to do is turn it on. Floppies for
both drives are included.
Again, we do not recommend any of these in particular, but
merely use them as a default measurement. The entire text
library of Project Gutenberg should fit nicely into these,
and should be relatively easy to search.
If these trends continue as they have for the past decade,
then you should see something with gigabytes by 2001, in a
similar price range.
We try to keep pace with the technology available to users
in the average ranges. We would like to grow at the rates
they are growing, so our goal is to double our output each
year. We are doing two books a month in 1992. We did one
a month in 1991. We plan on four per month in 1993. This
should be a relatively easy load for people to acquire.
The total output of Project Gutenberg in 1991 was about 9M
or maybe 10 if you kept all our notes. For the first half
of 1992, it was about 10M of files (this includes a Bible,
so this is a little larger). However, the main point is a
computer such as the one described above would use only 10
percent of its space to hold the last 24 books released by
Project Gutenberg. We estimate each 24 books will take 10
meg, so the entire year's output is expected to double any
year (1991=10, 1992=20, 1993=40, 1994=80, 1995=160, etc.)
Of course this will require a drive of over a gigabyte for
1995, if our library is to remain in one corner of it. It
seems highly likely however, that most computers costing 2
or 3 thousand dollars at that time will have one gigabyte,
if not more. Our personal caluculations have always based
on $1500 drives, as that was the cost of our first drives,
which were 5M (ST-506). Today that $1500 will buy a gig.
By the time Project Gutenberg got famous, the standard was
360K disks, so we did books such as Alice in Wonderland or
Peter Pan because they could fit on one disk. Now 1.44 is
the standard disk and ZIP is the standard compression; the
practical filesize is about three million characters, more
than long enough for the average book. However, we prefer
not to require users to use compression, at least until it
become a standard. That is why all our etexts are posted,
when we have control, in both ASCII and .zip files.
However, pictures are still so bulky to store on disk that
it will still be a while before we include even the lowres
Tenniel illustrations in Alice and Looking-Glass. However
we ARE very interested in doing them, and are only waiting
for advances in technology to release a test edition. The
market will have to establish SOME standards for graphics,
however, before we can attempt to reach general audiences,
at least on the graphics level.
To illustrate our faith in graphics, and in the future, we
have gone one step further in our pursuit of what we named
"Replicator Technology" TM a few years ago. We would like
the end of this phase of Project Gutenberg (at year's end,
2001 with a first 3D application of Replicator Technology,
by doing CAT, MRI and XRAY Fluoroscopy scans of something,
perhaps a painting, and printing 3D copies. If anyone can
get us access to a hundred year old masterpiece. . . .
2. The Project Gutenberg Etexts should so easily used that no
one should ever have to care about how to use, read, quote
and search them.
This has created a need to present these Project Gutenberg
Etexts in "Plain Vanilla ASCII" as we have come to call it
over the years.
The reason for this is simple. . .it is the only text mode
that is easy on both the eyes and the computer.
However, this encourages others to improve our etexts in a
variety of ways and to distribute them in a variety of the
available media, as follows:
Once an etext is created in Plain Vanilla ASCII, it is the
foundation for as many editions as anyone could hope to do
in the future. Anyone desiring an etext edition matching,
or not matching, a particular paper edition can readily do
the changes they like without having to prepare that whole
book again. They can use the Project Gutenberg Etext as a
foundation, and then build in any direction they like.
Thus any complaints about how we do italics, bold, and the
underscoring, or whether we should use this or that markup
formula are sent back with encouragement to do it any ways
any person wants it, and with the basic work already done,
with our compliments.
The same goes for media. We have had a long-standing work
ethic of providing our etexts in any medium people wanted:
Amiga, Apple, Atari. . .to IBM, to Mac, to TRS-80. . . .
However, now that our etexts are carried in so many BBS's,
networks and other locations, it is easier to download the
file in a manner that puts them in your format than we can
make and mail a disk, so we don't really do that too much.
The major point of all this is that years from now Project
Gutenberg Etexts are still going to be viable, but program
after program, and operating system after operating system
are going to go the way of the dinosaur, as will all those
pieces of hardware running them. Of course, this is valid
for all Plain Vanilla ASCII etexts. . .not just those your
access has allowed you to get from Project Gutenberg. The
point is that a decade from now we probably won't have the
same operating systems, or the same programs and therefore
all the various kinds of etexts that are not Plain Vanilla
ASCII will be obsolete. We need to have etexts in files a
Plain Vanilla search/reader program can deal with; this is
not to say there should never be any markup. . .just those
forms of markup should be easily convertible into regular,
Plain Vanilla ASCII files so their utility does not expire
when programs to use them are no longer with is. Remember
all the trouble with CONVERT programs to get files changed
from old word processor programs into Plain Vanilla ASCII?
Do you want to go through all that again with every book a
whole world ever puts into etext?
The value of Plain Vanilla ASCII is obvious. . .so is very
much of the value of most of the various markup systems we
have in the world. But until some real standards arrive--
we would be limiting our options a great deal if we do not
keep copies of all etexts in Plain Vanilla ASCII as well.
We don't have anything against markup. Not vice versa.
Alice in Wonderland, the Bible, Shakespeare, the Koran and
many others will be with us as long as civilization. . .an
operating system, a program, a markup system. . .will not.
This includes the many requests we have for compression in
particular formats. There are only two formats we know of
that are suitable for transfer to a wide general audience:
Plain Vanilla ASCII (.txt files) and ZIPped files of them,
(.zip files). Requests for other compression formats must
be ignored as they are appropriate only for small portions
of our target audience. However, (programmers take note:
we will need help) we are planning to put some compression
links on our files so they can be transmitted in any of an
assortment compression formats on the fly. i.e. we should
be able to generate any kind of file asked for, but we can
keep only one copy of each etext on our servers. . .as the
.Z compression format does in a similar manner today.
3. The selection of Project Gutenberg Etexts
There are three portions of the Project Gutenberg Library,
basically be described as:
A. Light Literature; such as Alice in Wonderland, Through
the Looking-Glass, Peter Pan, Aesop's Fables, etc.
B. Heavy Literature; such as the Bible or other religious
documents, Shakespeare, Moby Dick, Paradise Lost, etc.
C. References; such as Roget's Thesaurus, almanacs, and a
set of encyclopedia, dictionaries, etc.
The Light Literature Collection is designed to get persons
to the computer in the first place, whether the person may
be a pre-schooler or a great-grandparent. We love it when
we hear about kids or grandparents taking each other to an
etexts to Peter Pan when they come back from watching HOOK
at the movies, or when they read Alice in Wonderland after
seeing it on TV. We have also been told that nearly every
Star Trek movie has quoted current Project Gutenberg etext
releases (from Moby Dick in The Wrath of Kahn; a Peter Pan
quote finishing up the most recent, etc.) not to mention a
reference to Through the Looking-Glass in JFK. This was a
primary concern when we chose the books for our libraries.
We want people to be able to look up quotations they heard
in conversation, movies, music, other books, easily with a
library containing all these quotations in an easy to find
etext format. With Plain Vanilla ASCII you will be easily
able to search an entire library, without any program more
sophisticated than a plain search program. In fact, these
Project Gutenberg Etext files are so plain that you can do
a search on them without even using an intermediate search
program (i.e. a program between you and the disk) Norton's
and other direct disk access programs can search every one
of your files without you even naming them, pointing to an
etext directory, or whatever. You can simply search a raw
output from the disk. . .I do this on a half gigabyte disk
partition, containing all our editions.