home *** CD-ROM | disk | FTP | other *** search
- Linux WWW-HOWTO
- by Peter Dreuw, pdreuw@wing.gun.de
- v0.7.6, 6 October 1996
-
- This document contains information about setting up WWW services under
- Linux (both server and client) and how to maintain them. It tries not
- to be a in detail manual but an overview and a good pointer to further
- information.
-
- 1. Introduction
-
- Many people are steping into Linux 'cause they are looking for a
- really good internet capable platform. Others use Linux for the fun
- installing a free OS on their system. Some of those want to get in
- touch with the internet, of course. Furthermore, there are institutes,
- universities and other mostly not-for-profit organisations which want
- to or need to set up internet sites on small expenses. This is, where
- the WWW HowTo comes in. This document tries to explain how to set up
- clients and servers for the (in my mind) largest online part of the
- net - The World Wide Web.
-
- 1.1. Copyright
-
- This document is Copyright (c) 1996 by Peter Dreuw. Please copy and
- distribute it widely, but do not modify the text or omit my name.
-
- If you sell this HOWTO on a CD, in a book or on another media, I would
- really like to have a copy for reference.
-
- Trademarks are owned by there respective owners.
-
- 1.2. Disclaimer
-
- This document is meant as an introduction into WWW techniques used or
- usable on Linux. I an not an WWW nor security expert ! I AM NOT
- RESPONSIBLE FOR ANY DAMAGES INCURRED DUE TO ACTIONS TAKEN BASED ON THE
- INFORMATION INCLUDED IN THIS DOCUMENT.
-
- 1.3. Feedback
-
- Any feedback is really welcome. Just mail to pdreuw@wing.gun.de.
-
- 1.4. New versions of this Document
-
- New versions of this document can be retrieved via anonymous FTP from
- sunsite.unc.edu under /pub/Linux/docs/HOWTO and almost any friendly
- Linux ftp mirror site.
-
- Furthermore, you can download it via
- <http://ourworld.compuserve.com/homepages/dreuw/lxwwwh2.tgz> as
- gzipped tar archive containing a sgml, text, latex and ps version.
- The html version is directly available under
- <http://ourworld.compuserve.com/homepages/dreuw/lxwwwh2.htm>
-
- 2. Setting up WWW client software
-
- The following chapter is dedicated to the web users. Some hacks and
- tricks setting up current versions of common web browsers. Please feel
- free to contact me, if your favorite web browser is not mentioned
- here. (As this is a really early version of the WWW-HOWTO, most of
- them are likely not to be listed...)
-
- Personally, I prefer the Emacs - W3 browser and Lynx as they have some
- speed advantages and there is no need to retrieve the complete
- graphics through my slow speeded dial up line ;)
-
- 2.1. Overview
-
- Lynx is the smallest Web browser I know and use - but ist has many
- special features, so don't skip this chapter.
-
- Emacs - well there is nothing to say about the Emacs W3 browser, its
- just Emacs, like the Emacs news reader, the Emacs mail reader etc. pp.
-
- Netscape Navigator is the only browser mentioned here, which is
- capable of this new funny things like JavaScript and these nice
- <APPLET> tag feature needed tu run Java. Please report if there is any
- other web browser which can do the one or other. I'd really like to
- know.
-
- There are rumors, that Microsoft is going to port the Internet
- Explorer to varios Unix platforms - maybe including Linux. If you DO
- know something more reliable, please drop me a mail.
-
- 2.2. Lynx
-
- The smallest (?, hm, something around 650 K executable) and maybe
- fastest Web browser available. It does not eat up much bandwidth nor
- system resources as it only deals with text displays like any console,
- terminal or xterm. You don't need any X Window system nor additional
- megabytes of system memory running this little browser.
-
- Furthermore, the source code is available, too.
-
- 2.2.1. Where to get
-
- The latest version is 2.5 and can be retrieved from
- <http://www.wfbr.edu/dir/lynx> or from almost any friendly Linux ftp
- server like ftp://sunsite.unc.edu under
- /pub/Linux/system/Network/info-systems/www/ or mirror site.
-
- Or, take a look at the Lynx enhanced pages
- <http://www.nyu.edu/pages/wsn/subir/lynx.html> for information on
- using Lynx.
-
- 2.2.2. How to install
-
- Just retrieve the archive, unpack it, read the README and follow the
- steps told in the INSTALLATION file.
-
- If you don't want a source distribution, you'd maybe retrieve a binary
- distribution for the Linux on Intel based systems available on
- sunsite.
-
- Lynx compiles and runs on my system without any problems on both Linux
- 1.2.13 and 2.0.x.
-
- 2.2.3. Special features
-
- Well, there are. For a complete description, just read the manuals and
- doc files that come with Lynx.
-
- To get a nice glimpse, just type in
-
- lynx --help
-
- and be impressed.
-
- In my humble opinion, the most special feature of Lynx against all
- other web browsers is the capability for batch mode retrival. One can
- write a shell script which retrieves a document, file or anything like
- that via http, ftp, gopher, WAIS, NNTP or file:// - url's and save it
- to disk. Furthermore, one can fill in data into HTML forms in batch
- mode by simply redirecting the standard input and using the -post_data
- option.
-
- 2.3. Emacs-W3
-
- There is one sad thing about the Emacs W3 browser ;) If you got GNU
- Emacs or XEmacs running, you probably got the W3 browser running to.
- Not much work in this HOWTO. If you feel, that there should be more
- information about this, please let me know.
-
- The Emacs W3 mode is a nearly fully featured web browser system
- written in the Emacs Lisp system. It mostly deals with text, but can
- display graphics, too - at least - if you run the emacs under the X
- Window system.
-
- The most recent GNU emacs package is available under
- <ftp://prep.mit.ai.edu>, the most recent XEmacs could be retrieved
- from <ftp://ftp.xemacs.org>.
-
- 2.4. Netscape Navigator Gold 3.0
-
- Yeah, you made it. The Queen of WWW browsers. Something almost like
- Emacs is in the world of text editors. Netscape Navigator can do
- nearly everything (except cooking coffee... but maybe java will
- do...). But on the other hand, the most memory hungry and resource
- eating pice of web browser, news reader, mail reader (pop3), mail &
- news editor I've ever seen.
-
- My latest version of the Netscape Navigator Gold (export version) is
- from 28-Aug-1996 and (c) 1995, 1996 Netscape Communications Corp.
-
- (As I live in Europe, I can only get the export version...)
-
- 2.4.1. Where to get
-
- The first place to get the Netscape Navigator for Linux as binary
- distribution is on <ftp://ftp.netscape.com>. The second - as these
- servers are heavily loaded - may be any friendly netscape mirror site.
- You might as well ask archie about this. Maybe, you'll be happy and
- find it on a cd rom - this will save some bandwidth as the archive is
- quite large ( 2.5 MB).
-
- 2.4.2. Unpacking & Installing
-
- Unpack the archive und read the README file ! There is really nothing
- strange about this, you know.
-
- 2.4.3. Java applets with the navigator
-
- There are some reports telling that there are problems running java
- applets with the Netscape Navigator Gold 3.0 even if java is activated
- in the otions dialog. The archive known to me contained a file
- java_30 which must be renamed to java_30.zip. After this, any java
- applet should work fine within the netscape environment.
-
- If you continue to have problems using java applets like Netscape
- Navigator hangs or just terminates after downloading a java applet,
- take a look at your libc version. Just do a
-
- ldconfig -v | less
-
- (maybe, you have to be root doing so...) and watch out for an entry
-
- libc.so.5 => libc.so.5.xx.yy
-
- where your libc version is 5.xx.yy. If your libc isn't 5.2.18, this
- may be the problem. There are many reports for Linux 1.2.13 systems,
- that they should upgrade to libc 5.2.18 when the need to run Netscape
- Navigator in general. Additionally, it may be a good idea to downgrade
- your libc from 5.3.xx to the 5.2.18 if you run Netscape Navigator and
- a Linux 2.0.x kernel. (In fact, the libc 5.3.xx series is for beta
- testing purposes, so you should know what you're doing.) Some of the
- 5.3.xx series break Netscape Navigator and the Java classes code.
-
- For more information on Java on Linux or Java programming, please read
- the JAVA-HOWTO or visit <http://www.sun.com>.
-
- 3. Setting up WWW server systems
-
- This section contains information on different http server software
- packages and additional server side tools like script languages for
- CGI programs etc.
-
- For a technical description on the http mechanism, take a look at the
- RFC documents menitoned in the chapter "For further reading" of this
- HOWTO.
-
- 3.1. cern httpd
-
- As the cern original httpd server is reported to have some ugly bugs,
- to be quite slow and resource hungry, it is not described in this
- HOWTO by now. If you volunteer to admit some facts or chapters, please
- send them to me, I'll add them to this doc.
-
- 3.2. apache
-
- -To be written - sorry Features, Overview, Advantages
-
- 3.2.1. Where to get
-
- 3.2.2. Installing
-
- 3.2.3. Configuring
-
- 3.2.4. Special Features
-
- Apache httpd has got some special features in the actual version.
-
- 3.2.4.1. Host multicasting
-
- BlaBla??? how to setup ....
-
- 3.2.4.2. Module system
-
- how to include other modules ... where to get infos about module
- programming ...
-
- 3.3. CGI scripts systems
-
- - to be written - sorry - CGI (common gateway interface)
-
- 3.3.1. How does CGI work in principle ?
-
- - to be written - sorry - calling structure, http structure, program
- parameter format (slightly touched), things to keep in mind
-
- 3.3.2. Perl
-
- - to be written - sorry - something easy in perl (sample script)
-
- 3.3.3. PHP/FI
-
- - to be written - sorry - something easy in PHP/FI (sample script)
-
- 3.3.4. W3-mSQL
-
- - to be written - sorry - something even more easy (sample script)
- hint about setting up !!!
-
- 3.3.5. some useful scripts
-
- - to be written - sorry -FaxInbound to nice Table including php/fi
- script and shell script
-
- 4. Maintaining a WWW site or some Web Pages
-
- If you have to maintain a web site or if you maintain at least a web
- page, you have to think about your offer to the network and you have
- to spend some thoughts about approaching the reader / user of your web
- pages.
-
- 4.1. The mainstream: HTML technical
-
- Well, I'm not gonna tell you, how HTML is encoded an how you have to
- design your pages. I'll just give you some pointers where you can find
- more advanced information.
-
- You should take a look at <http://www.w3.org/> for the latest HTML
- language specification.
-
- Take a look at the list at the end of this article, you'll find more
- hints, where to read on.
-
- 4.2. Some thoughts about bandwidth
-
- Many users connect to the internet via slow speed modem lines. A
- speed range from 14,400 bps to 28,800 bps is state-of-art for "private
- sites". In europe, there are ISDN systems growing, but a speed of
- 64,000 bps isn't that more fast in comparison to - let's keep it
- simple - 10,000,000 bps ethernet. And 10 Mbps ethernet isn't really a
- high speed LAN connection nowadays.
-
- As you realize that many users don't have this fast access to the net,
- you should keep in mind to put up the relation between information and
- bytes. Optimize it at 1:1 - if you can. You may use graphics in your
- web pages following the multi media trend, but always remember the
- goals of your page and of the graphic you're going to put in. If most
- of your users are connected via a small modem line and the graphic
- severes only for estethic reasons or some eye-catching effects, you'd
- better bann it from your pages, or -at least- rerender it to the
- smallest possible file size and use best compression. Your users will
- like it.
- Always remember, nobody really likes an eye-catcher, that comes up
- about 3-5 minutes after the text message.
-
- 4.3. Some thoughts about server load
-
- On a web server, there is normally at least one server task running.
- If this task reads a request from a http client, it duplicates itself
- (on Linux it's called forking) and the new copy serves the request,
- while the original keeps listening for new requests. After finishing
- the request, the copy terminates. (In fact, some servers - like the
- apache - always keep up a default of five ready waiting server copies
- for requests parallel to the master incarnation for speed reasons.)
-
- Some web browsers like the Netscape Navigator series do many requests
- parallel on the same server, which increases the server load spend on
- the same user. These browsers e.g. retrieve the HTML page and parse
- them while retrieving and issue new requests for other information
- like the embedded graphics, applet files, sound files or any other
- additional mime-encoded data. In opposition, 'simple' browsers request
- and retrieve one file after another, which keeps the server per user
- load relation as low as possible.
-
- Many users prefer browsers that use the multi request technique like
- the Netscape Navigator, because they bring up a more complete overview
- on the requested page before the single request browser does.
-
- This is in my opinion because many page designers do stick on
- embedding the information into the graphics, denying the text-only
- browsers.
-
- So, we - as server maintainers - got the problem, that most of the
- users cast multiple requests on out server within the same page
- retrival. We can limit this by limiting the server software not to
- serve more requests than "x" from the same requesting system at the
- same time. But how to get this "x" ? It's not easy to calculate and a
- lot of personal expirience on your site is necessary to depict it. But
- I'll give you some hints. We have to take our connection bandwidth
- into account, our server memory size, some feeling about our servers
- cpu/disk performance and ... well, that's enough for the first
- glimpse. You should take a look at the memory usage a single server
- task has. Then think, how many of them could kept in memory at all.
- Think, how many per cents of your web pages could remain in your
- servers disk cache. Optimize the count of web server tasks against the
- disk cache size and you're really near to your personal "x".
- Furthermore, you can put in other jobs the server got. E.g. if your
- system also serves for ftp, you might limit the maximum possible
- connections to keep up some minimum room for the ftp server task. If
- your web server also does some database services, you'd better keep up
- some cpu cycles and also shrink your "x". Play somewhat around with
- these values and test them. And (!) read the following chapter about
- CGI scripting, which also takes server performance and - depending on
- the CGI jobs - amount of memory.
-
- 4.4. CGI vs. Applet / Client side script
-
- - to be written - sorry - overview ond advantage/disadvantage and
- hints when to use which.
-
- 4.5. Style ideas
-
- Uh, a really difficult theme for beeing on a short sentence. I don't
- try to mix up your genious design ideas. Nor I'm gonna put you into my
- personal design strategies. I'd just like to add one or two statements
- to the above ideas on server load and bandwidth.
-
- Numerous research on human behavior on user interfaces and on-screen
- presentation have brought out interesting results. There are some
- simple facts one should keep in mind designing WWW pages.
-
- ╖ Keep text in short blocks. This HOWTO is ugly to read on screen,
- but nice to read in paper print. (Try it yourself!) Human beeings
- often have difficulties to read lengthy text printed on screen.
- They loose their point in the sentence; their concentration
- suffers.
-
- ╖ Don't mix up graphics and text blocks. This is a good-looking but
- ugly-to-read feature. You can spread Headlines, eyecatchers but,
- please, don't mix up block text with graphics. Behaviorists found
- out, that human are much more attracted by graphis on screen than
- by text. People find it more easy to realize a graphic on screen
- than on paper, in opposition to text which is more easy to "see and
- decode" on paper than screen.
-
- Did you know this ? If you'd like to get more information on that,
- search for GUI style guides and ergonomy research results done by many
- universities and software companies (including MS).
-
- 4.6. HTML editors under Linux
-
- Hm, there are some. In fact, there are reported to be many. But as I
- already shot my shoot, I didn't test them all. But I am really
- curiosly looking forward to read the reports you're gonna mail.
-
- 4.6.1. vi, vim
-
- vi and vim are perfectly usable for writing HTML code... (don't flame
- me on that) because HTML code only uses ASCII text chars. I don't want
- to give stuff for another editor war. Those who know vi/vim and use it
- daily can use it for HTML code either. You can make vi/vim help you
- developing HTML code by doing some macros for vi/vim. But as this is
- no VI-HOWTO, I'll leave this fact alone here. Just take it, that it is
- possible to use vi/vim for HTML editing (at least for some short
- changes). If you already know how to program vi/vim, you'll certainly
- know how to abstract for HTML either. If you don't do so, well, don't
- care.
-
- 4.6.2. emacs & XEmacs
-
- - to be written - sorry -
-
- 4.6.3. asWedit
-
- - to be written - sorry -
-
- 4.6.4. other pointers
-
- Ah, there was some reference for a package named phoenix, based on
- tkWWW, but I was not able to get them running on my system. I think,
- it was a problem with my tcl/tk versions but you'll never know. I
- didn't spend much time around with them, so, maybe they'll run on your
- system both. Just go'n ask archie. Maybe, you can drop me a mail, if
- you are sucessful.
-
- If you miss your faivorite HTML editor here, just write a mail to me.
- Maybe, I'll add some pointers to web pages about HTML editors for
- Linux to. Just send me some nice URL's.
-
- 4.7. Graphics
-
- Thoughts, Ideas, Hints ? Well, you may read the comp.graphics
- newsgroup. And, you can visit <http://www.w3.org/pub/WWW/Graphics/>.
-
- 4.7.1. Format gif
-
- GIF (Graphics Interchange Format) was introduced 1987 by Compuserve,
- Inc. an revised 1989. It uses a LZ algorythm, which underlies U.S.
- copyright or patent law. So there might exist some legal problems
- using this graphics format in the internet - despite the fact that
- nearly anybody does.
-
- Gif is a good format for small pictures with simple structured
- graphics like computer graphics or banners.
-
- Gif has some advantages as it is one of the (if not the) widest spread
- graphic formats in online systems:
-
- ╖ offers a good compression
-
- ╖ compresses without information loss
-
- ╖ has a interlace capability, i.e. pictures could be viewed in full
- size (with less resolution) before they're retrieved completely.
-
- ╖ can hold more than one picture within one file
-
- ╖ can hold a small animation in one file
-
- ╖ nearly any graphical web browser supports gif
-
- ╖ can hold a transparent color
-
- ╖ fast decompression system
-
- The disadvantages are:
-
- ╖ only 256 color pictures possible
-
- ╖ license and copyright problems (?)
-
- ╖ not ideal file size
-
- 4.7.2. Format jpeg
-
- The Joint Graphic Experts Group (JPEG) did the design for the
- jpeg/jpg/jiff graphic format. This format is based on a discrete
- cosinus transformation (DCT) and a Huffmann encode compression. JPEG
- works with an significant information loss, which can make your
- pictures somewhath less colorous or less sharp. Typical compression
- factor is 1:5 ranging to 1:50. (Above 1:10 anybody is able to see the
- artefacts risen through the compression/decompression cyle.)
-
- JPEG is a good format for photographies, large graphics and really
- complex pictures.
-
- The advantages are:
-
- ╖ strong compression, small files and therefor fast download...
-
- ╖ any graphical browser knows about jpeg
-
- The disadvantages are:
-
- ╖ slow compression/decompression
-
- ╖ possible information loss
-
- 4.7.3. Format png
-
- Portable Network Graphics (PNG) - the new format on the net. PNG is
- favorised by the W3 consortium. For some more special information
- visit <http://www.w3.org/pub/WWW/TR/WD-png.html> and
- <http://www.w3.org/pub/WWW/Graphics/PNG/Overview.html>. Here you'll
- find a technical specification, some programmers information etc. PNG
- is a ideal format replacing GIF. The PNG homepage is on
- <http://quest.jpl.nasa.gov/PNG/>. For the users, PNG will have some
- advantages and some disadvantages. Here they are:
-
- For the advantages:
-
- ╖ can replace the license loaded GIF - PNG has no license problems
-
- ╖ 256 palette system as well as grayscale and true color capability
- including a transparency element
-
- ╖ complex interlace mode where not only different lines are sequenced
- but a two dimensional serialize system retrieves the picture
- resulting the user to realise the picture content more early.
-
- ╖ fast decompression algorythm is possible
-
- ╖ public available description - license free
-
- ╖ public available sample code - license free
-
- ╖ extensible design
-
- For the disadvantages:
-
- ╖ not widely spread (Netscape does not support it by now, some
- plugins do)
-
- ╖ not so strong compressing pictures
-
- ╖ no final specification ready, in working draft state.
-
- PNG is currently supported on Linux through the following programs:
- ImageMagick (Version >=3.7), GhostScript 4.0, Gimp, PovRay 3.0, the
- netpbm package. For xv 3.10a there exists an inofficial patch.
-
- 4.7.4. Converters
-
- - to be written - sorry - netpbm, xv, ghostscript, gimp, ImageMagick,
- CorelDraw auf Wine :-)))
-
- 4.8. Specials
-
- There are now many specials beyond the HTML'n'Image range. There are
- Applets written in Java and JavaScript pages and many things beyond.
-
- 4.8.1. Java
-
- There is nothing to add about Java in general, just read the java
- section in the Netscape Navigator chapter of this HOWTO and the
- overview on Java Applett vs. CGI script in this HOWTO. Then, you can
- also read the really good and compact Linux JAVA HOWTO. For
- programming Java, please refer really good books on that.
-
- 4.8.2. ActiveX
-
- ActiveX is at the time of writing still a Microsoft child. Microsoft
- claimed, that they would release it to the public domain or at least
- to release it to a ActiveX consortium.
-
- ActiveX has nothing to do with the X Window system nor with XFree.
-
- It is derived from the Microsoft and IBM OLE system. After releasing
- the specs, there should be a Unix port. But, we have to wait till
- then. Nothing for Linux, yet.
-
- 5. FAQ
-
- There aren't any frequent asked questions - yet...
-
- 6. For further reading
-
- ╖ RFC1866 written by T. Berners-Lee and D. Connolly, "Hypertext
- Markup Language - 2.0", 11/03/1995
-
- ╖ RFC1867 writtenm by E. Nebel and L. Masinter, "Form-based File
- Upload in HTML", 11/07/1995
-
- ╖ RFC1942 written by D. Raggett, "HTML Tables", 05/15/1996
-
- ╖ RFC1945 by T. Berners-Lee, R. Fielding, H. Nielsen, "Hypertext
- Transfer Protocol -- HTTP/1.0", 05/17/1996.
- ╖ RFC1630 by T. Berners-Lee, "Universal Resource Identifiers in WWW:
- A Unifying Syntax for the Expression of Names and Addresses of
- Objects on the Network as used in the World-Wide Web", 06/09/1994
-
- ╖ RFC1959 by T. Howes, M. Smith, "An LDAP URL Format", 06/19/1996
-
- 7. Thanks
-
- Special thanks to Greg Hankins gregh@cc.gatech.edu for encuraging me
- to write this work and the fun I had doing it.
-
- I'd also like to thank Chris Hendricks, Fido: 2:2433/443@fidonet.org
- for his engagement in Linux and my personal race to keep at least one
- nose ahead :-)
-
-