|
Volume Number: 21 (2005)
Issue Number: 1
Column Tag: Programming
The Web Server in OS X
by Edward Marczak
We've all had occasion to serve up some web pages, right? Well, then you know how great Apache is and all of the flexibility and configuration options available to you, right? What? You've always used the default install? While that may be plenty powerful, we can do plenty more. Apache is so vast, that we're really only going to scratch the surface here. The great thing is that Apple has seen fit to install Apache for us, so we can skip any talk about retrieving the source, compiling and installing from source, preparing TCP/IP, and other topics that come along with most Apache material. As always, get Terminal fired up, and let's go.
History Lesson
I don't like handing out history lessons, especially when people are eager to jump into a topic. However, as a poster-child for the open source movement (even though there's no formal association with GNU or the FSF), Apache really warrants one. Editors Note: For more on the history of Open Source, look for Dean Shavit's new Column "The Source Hound", beginning next month in MacTech!
The roots of Apache stretch back to the NCSA's http daemon, written by Rob McCool. In 1995, it was the most popular web server on the Internet, despite being an un-maintained project for about a year. To keep NCSA's httpd going, several webmasters that had been using it contacted each other and shared patches that they had coded themselves. Two of these members, Brian Behlendorf and Cliff Skolnick took this a step further by providing a mailing list and shared file storage space for the core developers. Shortly thereafter, eight people formed the core, and the Apache Group was born.
While the Apache Group still exists with its core developers, the source code continues to be freely available. With OS X, Apple has included and integrated many projects that are open source, and tend to run on many platforms. While this naturally makes certain things easier for Apple, they really have chosen best-of-breed applications: Apache, Postfix, PostgreSQL and others. In turn, Apple has opened up the foundation of OS X, Darwin.
You'll find many reasons why open source developers do what they do, the fact that Apache is open is very important to the Internet itself. As the most widely used http server, it serves as a reference platform. Thanks to the Apache Group, and others like them, the tools of Internet publishing are available to everyone. The playing field is leveled, and the protocols of the World Wide Web remain 'unowned' by any one entity. This allows big business, governments and individuals to run a web server, understand the means of delivery and speak to the world.
By the way: for historical purposes, you can find the NCSA httpd page at http://hoohoo.ncsa.uiuc.edu
The Power
To paraphrase the infamous zombo-com, "You can do anything with Apache. Anything at all. The only limit is yourself." Welcome to Apache. Thanks to many factors, such as the ability to write custom modules, Apache is incredibly flexible. And with that flexibility comes the power to massage your web server into doing just about anything you see fit.
If you're into compiling Apache yourself, you're probably way ahead of this article, so feel free skip ahead to, well, the conclusion. However, since Apple has already done this for us, and included some of the more popular modules (including PHP integration), I'm not going to discuss doing so. Also, I'll be concentrating on OS X client, as OS X server has a relatively decent GUI to control Apache.
Up and Running
Before we start, let's make sure everything is in order. Open System Preferences, click on 'Sharing', and make sure 'Personal Web Sharing' is started. If it isn't, check the box. Your panel should look similar to figure 1.
Figure 1 - OS X telling us
that personal web sharing is enabled.
If, for some reason, you check the box, the machine thinks for a bit, but then un-checks the box for you, see the troubleshooting section next.
Once you've made sure that's running, you can connect to the web server that's now running on your own machine. Launch your web browser of choice (Firefox, Opera, Safari, the old IE-Mac, or other), and type this in the address bar, without the quotes: 'http: //127.0.0.1'. If you've never touched your Apache installation, you should be looking at a screen like the one in figure 2.
Figure 2 - Welcome to
Apache
Again, if something goes horribly awry, we'll try to help in the troubleshooting section below.
If, like me, you're a Terminal person, you can also start and stop Apache from the command line using the 'apachectl' command. 'apachectl start' starts the server, 'apachectl stop' stops the server. Something you can't do from the GUI: 'apachectl restart' will stop and start all in one fell swoop. This is great when you make configuration changes and need Apache to start using them.
Be Careful Out There
I need to preface the rest of this article by stressing the need to be careful. We're going to need to work as root to edit any of the Apache configuration files. This means two things: a) you have a good chance of mucking up your Apache installation and b) you can muck up your entire system. Until you're comfortable working as root, and with the changes that we make to Apache, do not work on a system currently in production. Please perform all of these changes on a test system and, well, test them. Sure, the world won't come crashing down if you muck up a web server. But if that server represents someone else's work, or livelihood, someone's world will come crashing down.
All the Files
When you're setting up a web server, there are basically two sets of files that you're concerned with: files that tell the web server how to do its job (config files) and files that you're trying to serve to the public (html, mainly). It's important that these files have appropriate permissions (remember those things?), as random people on the system should not be able to alter the server config or the files being served.
icely enough, Apple pre-configures our systems with a user named 'www' and a group named 'www'. The server starts as 'root', but then creates child processes that run as 'www'. The root process does not service requests for files at all. It simply manages all of the children. This is good from a security perspective.
Once running, the 'www' user needs to be able to read the files it is to serve to the world. So, once again, permissions must be right. So, I'll mention file permissions at the end of each section that talks about files.
Configuration Files
It's fine to simply make your sever run. But how can we really make it do what we want? We need to alter the configuration file that Apache reads at its startup. Apple has chosen to remain with the de facto 'standard' of keeping the config files in a sub-directory of /etc, called httpd. Get into Terminal (10 points for everyone already in Terminal), and become root. Do this either through the command "su -", which will ask for your root password, or the command "sudo bash", which will ask for your password. Once you have the root prompt (should end in a number sign '#'), you're ready. Change directory to /etc/httpd and list the contents ('ls -l'). You'll see something like this:
$ ls -l total 600 -rw-r--r-- 1 root wheel 39884 16 Nov 23:29 httpd.conf -rw-r--r-- 1 root wheel 37306 18 Nov 2003 httpd.conf.applesaved -rw-r--r-- 1 root wheel 37047 4 Feb 2004 httpd.conf.bak -rw-r--r-- 1 root wheel 38008 4 Feb 2004 httpd.conf.default -rw-r--r-- 1 root wheel 33725 15 Dec 2003 httpd.conf.defaultserver -rw-r--r-- 1 root wheel 37306 18 Nov 2003 httpd.conf.erm -rw-r--r-- 1 root wheel 12965 15 Dec 2003 magic -rw-r--r-- 1 root wheel 12965 15 Dec 2003 magic.default -rw-r--r-- 1 root wheel 15150 15 Dec 2003 mime.types -rw-r--r-- 1 root wheel 15150 15 Dec 2003 mime.types.default drwxr-xr-x 10 root wheel 340 18 Nov 2003 old
There will be some files in this listing that you do not have. Don't worry, you will after this article. The file we're after is 'httpd.conf'. In the early days of Apache, the configuration files were broken into three files: httpd.conf, srm.conf and access.conf - a holdover from its NCSA roots. Apache can still work this way, and I still maintain one or two servers like this (aside from upgrades to Apache itself, they've been running with virtually no changes to the structure since 1997).
Apache reads httpd.conf first, then srm.conf and finally access.conf. After a while, most people would forget which directives were supposed to go in which config file. Use of srm.conf and access.conf are now depreciated, and it is recommended that all directives be put into httpd.conf. So now, not only can you simply ignore the two extra files, that behavior can be completely overridden. Sometimes, though, it may be nice to break up a large configuration file into more manageable chunks (perhaps you really only want to give certain people the ability to change certain parts of the config, but not others...remember permissions?). While it's truly wonderful to have everything that affects your server in one place, it also makes for one big file to trudge through when you're new to it. Apple maintains the current recommendations and simply gives us one httpd.conf file.
se your favorite editor (vi) and open up httpd.conf. You'll be greeted with a fair amount comments at the top of the file. Hey, look, "Based upon the NCSA server configuration files originally by Rob McCool." History! "This is the main Apache server configuration file." Yup, that's what we're after. "Do NOT simply read the instructions in here without understanding what they do. They're here only as hints or reminders. If you are unsure consult the online docs. You have been warned." Gulp. That doesn't sound too friendly.
Well, in all actuality, the default httpd.conf is extremely friendly. In fact, the default values have been chosen very wisely. Between the core team, and input from real, everyday Apache users, the httpd.conf file contains good, real world defaults. Now, the real world according to a web server is very different if you are "Mike's home page" or if you are amazon.com. But for people downloading the source, one can unpack, build and go in a short amount of time. Apple has basically kept all of the defaults, with some Apple-specific changes that I'll point out further on.
Scroll down a bit in the file and you'll come to "Section 1: Global Configuration". This 'section' (it's really only delimited by comments) and it's settings apply to the way the overall server runs. While I can't touch on every single parameter, I will touch on the ones important to our discussion. Anything that doesn't get mentioned should be left untouched. Let's see what these entries do:
ServerType: Can be either 'inetd' or 'standalone'. 'inetd' would apply if you're running Apache through TCP wrappers (to be addressed in a future column). Short story is this: while many, many applications do run through tcp wrappers, I've never personally seen an Apache installation that does so. Leaving this set at 'standalone' lets Apache handle all of its own requests by itself. Leave this set at 'standalone'.
erverRoot: Here's one where Apple confounds me. 'ServerRoot' is typically where you put all of your stuff: html files, includes, and more. Apple chose '/usr' for this. Odd. In the httpd.conf that accompanies OS X Server, there's actually a note preceding this choice: "For Mac OS X Server: Changing this is OK." Now, we're safe because this gets over-ridden everywhere else by specifying absolute paths (ones that start with '/'). But '/usr' really is an odd choice, as relative paths are relative to the directory specified here.
A little further down, you'll see that the directives that would normally load srm.conf and access.conf (AccessConfig and ResourceConfig) have been commented out. Apple wants everything in one big file.
Next up is 'Server-pool Management'. Apache can be pretty intelligent about using resources. It's important that you feed it good information, though, to base its decisions on. It can dynamically adapt to the incoming request load, and then back off when the load lightens up. There are three directives that are important here: MinSpareServers, MaxSpareServers and StartServers. 'StartServers' tells the master Apache process how many child servers to start up immediately. If you have a heavily hit site, you should crank this up a bit. 'MinSpareServers' tells Apache how many spare httpd processes it should keep hanging around for that big burst of traffic. If you're a major site, you'd load this up. On the other hand, if you're setting up a server for a small intranet, you can leave this alone and let Apache dynamically allocate new servers as needed. 'MaxSpareServers' gives Apache the ceiling on how many child processes will be left hanging around, unused.
Anyone who has set up Apache on their own will notice that Apple has made the defaults a little lower that usual. In the httpd.conf we receive, we start 1 server, have 1 minimum spare, and 5 spare. The config file from the Apache Group sets these values to 5, 5 and 10, respectively. I'll venture a guess as to why Apple does this: each server process that runs sucks up resources. These resources come in the form of RAM and open files (which, have hard limits in the system). So, how many people out there installed Panther, got to the Sharing Pref Pane, saw 'Personal Web Sharing' and said, "Cool!" They then proceeded to start up Apache, only to never use it again. Apple is trying to help this person not have resources spirited away to some unknown place. I'd like to say that they also did this to keep the config that runs on OS X client nice and small, but OS X Server has the same defaults. A little low in my opinion, but at least these values can be altered through the Server Admin GUI.
MaxClients: Says what it does, does what it says. Basically, this is how many clients can access your server at once. Anyone who's been surfing the 'net for a while probably remembers trying to access a moderately popular web page, only to be greeted with, "Service Unavailable." Not having this parameter set high enough is one of the reasons you see this.
The last 4 parameters discussed are really what make a web server individual - from the configuration side, at least. You need to monitor your system, tweak, monitor some more and tweak again. At the very least, if people are getting shut out of your site, you now know one place to look.
The 'Listen' directive: Which IP address and port should the server be listening on? Apple has
commented this out, which simply has Apache listen to the default of all IP addresses on the system,
and port 80. You can issue several 'Listen' directives, and Apache will add the address or port to
its list. A similar directive, also commented out by Apple, is 'BindAddress'. One main difference
is that only one BindAddress directive is permitted. Honestly, I never use BindAddress, as you get
the same functionality out of 'Listen', 'Port' and '
Next, you should see a grouping of 'LoadModule' and 'AddModule' directives. It would be way too much to into each one of these individually. Of course, an overview is in order. When Apache is compiled, you have the option of including support for modules that are dynamically loaded at runtime, rather than compiled in (statically). The module responsible for this is mod_so (shared object). 'LoadModule' links an object file into an Apache process at launch. 'AddModule' enables a module's use, as a module may be compiled in , but inactive. Simple rule: you need the module, you load it and you add it. Interestingly, the order that modules are added is important. Modules added later on can override the behavior of ones earlier in the list.
One last comment about the modules: As of Panther, Apple enables PHP by default. Three cheers for Apple! Prior to Panther, you had uncomment the appropriate lines from the httpd.conf file to make PHP load. Thank you Apple. I want my PHP! They even made the appropriate change to allow an index file to be 'index.php'. Nice.
On to section 2! This is often called the 'Main Server Configuration'. If you plan to serve up a single site, this is all you will need. Important stuff here:
'Port'. Which port number your server listens to. Easy.
'user' and 'group'. Important ones, for certain! As mentioned earlier, we give Apache a non-root user to run as. Many Unix systems use the user and group 'nobody', but I like 'www' even better. Many services run as 'nobody', and I want permissions as granular as possible. When we set up permissions, the user and/or group 'www' will need to have access to the files we're going to serve up.
'ServerAdmin'. Is this a critical value? Well, sure, the server will run without you setting it. But the default value is 'you@your.address'. The ServerAdmin value sometimes shows up in server-generated error messages. So don't look like you haven't done your homework. This is an easy one. Set it to an appropriate e-mail address that people can send mail to if they're having problems.
ServerName'. Either change this to your machine's Fully Qualified Domain Name (FQDN), or, if this is a development machine, laptop or otherwise, leave Apple's default of '127.0.0.1'. If you're serving pages out to the Internet, you pretty much must have working DNS pointing to your box, with the appropriate FQDN in the ServerName directive, otherwise, relative links are going to fail. Heed the warning in the comments: "You cannot just invent host names and hope they work." Yes, you could do everything by IP address, but, do you really want to?
'DocumentRoot'. is where your html files live! Apple sets this to "/Library/WebServer/Documents". Now, sure, this can really be anywhere, but as a long time Apache person, this has always felt a little odd. I actually comment this out altogether. More on that in the 'Virtual Hosts' section below.
Now, you should be up to a line that says "
Options All AllowOverride All
This lets us do all sorts of nasty things at http://127.0.0.1/experiment, but would still keep http: //127.0.0.1/testlab with the restrictive permissions we set on "/".
Immediately following the short Directory directive, we find a larger one that applies to where our web documents live. This block creates restrictions that are a little looser than the previous block.
What has been covered so far is all you really need to get the server running, modify where it puts its files and serve some custom content like a main site (as opposed to one that lives in ~user/Sites - bah!). However, you'll notice that the httpd.conf file continues on for quite a bit! For now, I'm only going to cover three more directives. Some of which I'm going to save the detailed explanation for a later section.
Move down through the file a page or two until you see 'AccessFileName'. The default, and de-facto standard, is ".htaccess". This is an important directive that needs further explanation, and I'll get to such an explanation in its own section. Following this, you'll see 'HostNameLookups'. This directive has Apache resolve the incoming request's IP address to a name. This defaults to 'off' and should stay that way. Of course, while only you can determine what is right for your site, turning this on can cause a huge strain on both your web and DNS servers. Then again, if you're, say, a bank, and have the infrastructure, you may really have good reason to have this on. Next up are the logging directives. Toward the end of these, you'll see a line that says "CustomLog "/private/var/log/httpd/access_log" combined". Apache actually keeps two logs, a standard log and an error log. The 'CustomLog' directive tells the standard log where to store itself and in what format. For now, just be aware that it exists, and read further on to understand logging.
To edit the httpd.conf file, you've needed to be root, or some root-equivalent. If you harken back to last month's discussion of permissions, you'll see that Apple has marked the Apache configuration files as owned by root and wheel, with rights of 755. All of the files inside httpd are marked 644, also owned by root and wheel. This is exactly where you should leave these permissions set. It should be difficult to edit these files. You should have to be really conscious of what's going on in this directory. No changes should be possible with out being authorized - especially by a rogue program. Plus, you may have no choice: running DiskUtility's 'Repair Permissions' will reset these permissions as just described.
Additionally, make sure you backup your httpd.conf file! Two big reasons for this. Once you have a working version, back it up before you make any major changes. This way, you can roll back to your copy when things don't work the way you expect (or perhaps, at all). Also, Apple likes to step on httpd.conf when they update Apache through Software Update, either because of a security update or other bug fix. While they have started creating an 'httpd.conf.applesaved' file when they muck with it, I personally would trust my own backup much more. So, save early, safe often.
Files to Serve
If you're a visual person, and want to see some content, now is the time! The files that are in your DocumentRoot are meant to be served to the public. What you place here is up to you. Straight html, PHP files that access databases, javascript, text files, have fun. Of course, there are a few things you should know.
If you haven't touched the default web server directory yet, take a look in there ("/Library/WebServer/Documents"). There's a whole load of files. Instead of the familiar ".html" extension, we see files that have extensions such as ".cz", ".fr" and ".po.iso-pl". You've probably guessed that this allows Apache to serve files based on one's language preference. But how does it know? Apache calls this 'content negotiation', and is handled by the 'AddLanguage' statement in httpd.conf. You don't have to understand every facet of how this takes place to make it work nicely on your server. To find more about this feature than I can present here, see the Apache documentation on the subject at http://httpd.apache.org/docs/content- negotiation.html.
If you'd like, you can backup, move or just delete the contents of this directory (provided that you're working on your own machine and are sure you're not meddling with files someone is relying on). If you have some html docs, great. Otherwise, we'll make a simple one. Launch BBEdit, SubEthaEdit, vi, Pico, emacs - your choice of text editor. Despite the name, do not use Apple's "TextEdit". It does not save files in plain text format. C'mon, Apple! In your editor, simply type, "This is a test." If you're in the GUI, you need to be using an admin-level account to save it in the proper directory. If you're in Terminal, I'll assume you're still running as root. Save your file as "/Library/WebServer/Documents/test.html". In your web browser, type the URL http://127.0.0.1/test.html. You should see the equivalent of figure 3.
Figure 3 - the working
server and web page
Not the prettiest page we've ever seen, but it shows that everything is working as expected. From there, navigate to "/Library/WebServer/Documents" and create a directory called "test". Shift back to your editor, and alter your html file to say, "This is a subdirectory test" and save it as "/Library/WebServer/Documents/test/index.html". Now, in your web browser, go to the URL http: //127.0.0.1/test. You'll see the equivalent of figure 4.
Figure 4 - working with
subdirectories
What exactly happened here? You've probably noticed that at most websites you visit, you haven't had to explicitly mention the file you're looking for. When you visit, http://www.apple.com you're not asking for any file in particular, are you? You're just saying, "Hey, web server! Gimme what you've got!" As mentioned earlier, Apple has set both "index.html" and "index.php" to be 'index files': if present in a directory, that file will be served up if the directory itself is asked for. So, by naming our file 'index.html' and putting it in its own directory, it will be served to the browser when that directory is asked for without asking for a specific file.
What kind of permissions do we want to have on these files? We know that the configuration files need to be locked down. However, the files we're putting in this directory are going to get served to the world via the web server. By their very definition, they're public. You'll see that Apple has them owned, again, by root and wheel, and restricted to 654. However, this is a directory that repair permissions does not touch. There is really only one absolute here, and that is that the web server must be able to read these files to serve them! That means that the user, or group, 'www' must have access. Since these files are already owned by root and wheel, you'll see that the web server is accessing them through the read attribute given to 'others'. If you had a large web development team, you could create a new group called 'webdev' and make them the group owner of the files in your DocumentRoot. This way, the people in that group could alter the contents of the web server without having a root account. What I'm saying here is that there is no hard and fast 'right-way' to set up the permissions of this directory. There is a wrong way, though, and that's to mark everything 777. I know you're tempted, but don't do it. Practice safe computing. The right way restricts things down as far as possible, while allowing everyone to do their job, including Apache itself! Do note this, though: you rarely, if ever, want anything actually owned and writable by 'www'! This way, programs that the web server executes, like cgi scripts, can't damage files they wouldn't normally be able to damage or alter otherwise.
Logging
How do you know if anyone is using your web server? How can you tell of there are any problems with content that is being served (or, not served)? How can we tell how often people are visiting, and how much bandwidth they are using? The answers to all of these questions lie in logging. After any transaction performed by Apache, it will write an entry to one of two logs: the access log (success!), or the error log (problems!).
Earlier, I pointed out the 'CustomLog' directive. In Apple's httpd.conf file, it looks like this:
CustomLog "/private/var/log/httpd/access_log" combined
We tell Apache where we want to store our log file, and in what format. Just above this, you'll see some lines that begin with 'LogFormat'. These directives describe the layout of what gets logged, plus a nickname for that format. I recommend you stick with the 'combined' format, that looks like this:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
Whenever a transaction takes places, this line directs Apache to log the:
- %h Host. The host's IP address, or, if HostNameLookups is on, the resolved DNS name.
- %l - Remote identification. Supplied from the remote identd. You'll rarely get a name with this, and more often just see "-".
- %u - User.
- %t - Time. The time and date that the transfer completed.
- %r - What the browser asked for. Will be a string like "http: //www.example.com/webpage.php".
- %>s - Success Code. Otherwise it'd be in the error log, right?
- %b - Size of the transfer in bytes.
- %{Referer}i - How was this user referred? This will either be "-" (for no referrer), or a URL, like "http: //www.example.com/somepage.html". You may also have noticed this is mis-spelt by the Apache team...oops. Yes, you have to type it incorrectly to have it recognized.
- %{User-Agent}i - The User Agent - What the browser tells us it is. Safari will show "Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/125.5.5 (KHTML, like Gecko) Safari/125.11"
Why use the combined format? The default on most distributions is "common", which logs everything that "combined" does, minus the referrer and user agent. Frankly, those are nice statistics to have. You are free to come up with whatever log format you'd like. See the Apache documentation on the subject at http://httpd.apache.org/docs/logs.html, where you'll also find some other cool logging tips and tricks. However, "combined" has become a recognized logging format and many off-the-shelf log analyzers recognize it. Basically, I'm telling you to stick with combined and not change a thing.
Once Apache is running, you'll find this log in the place specified by the 'CustomLog' directive. According to Apple, that place is "/private/var/log/httpd with a name of access_log. Looks like a good place to me. You can view this log with Console.App (as found in /Applications/Utilities), or with the command line utility 'tail', or 'less'. I prefer 'tail' with the '-f' switch over Console.App, as it spits out new lines in close to real-time.
The next log to be concerned with is 'ErrorLog'. This name of this log is a bit of a misnomer. While the access log is just that, a clean list of what was accessed, good for log analyzers, the 'error' log consists of errors and general notices. For example, next is an entry from my error_log that shows Apache starting up.
[Fri Dec 3 18:38:09 2004] [notice] Apache/1.3.29 (Darwin) PHP/4.3.2 DAV/1.0.3 configured -- resuming normal operations
That's not an error! In fact, it's even marked at a level of 'notice', a.k.a. 'No worries.' The lines that are marked 'error', however, need to be paid attention to. Let's sample an error_log from a production site (aspects changed to shield the innocent):
[Thu Oct 28 05:03:05 2004] [error] [client 231.50.143.44] script not found or unable to stat: /www/httpd/cgi-bin/formmail.pl [Thu Oct 28 14:27:17 2004] [error] mod_ssl: SSL handshake interrupted by system [Hint: Stop button pressed in browser?!] (System error follows) [Thu Oct 28 14:27:17 2004] [error] System: Connection reset by peer (errno: 104) [Fri Oct 29 12:17:38 2004] [error] PHP Warning: mysql_connect(): Can't connect to local MySQL server through socket '/www/tmp/mysql-client.sock' (2) in /www/httpd/includes/functions.php on line 180
The first error is a simple file-not-found message. In fact, it's someone searching for the old formmail perl script, which was easy to exploit. The next two errors turned out to be just what the error writer guessed - a browser quit before the SSL handshake completed. The final line is what happens when PHP tries to access a MySQL database that doesn't exist - it had crashed.
Watch your logs closely. Time done so will pay off in buckets.
Virtual Hosts
Apache's virtual hosts, along with the http 1.1 specification, may be the single most important change to web serving, allowing consolidation, easier provisioning and the conservation of IP address space. Unfortunately, I talk to a lot of people who are confused by virtual hosts. If you don't think about it too much, it comes pretty easy.
The hyper-text transfer protocol version 1 specified the request and reply messages that travel between a browser and server to exchange a web page. However, once the client resolves the server name (such as www.example.com) and turns it into an IP address, the browser would simply send a GET request to that IP address. The server would then serve up the web page that was requested on that IP address. If your server had multiple IP addresses, you could run multiple versions of Apache, each bound to one of the IP addresses, that would serve up a completely separate web site (they'd have different 'DocumentRoot's).
The http 1.1 spec came along and added the fact that the browser must now pass the name of the site it's looking for in the request. Now, you could actually run one single copy of Apache with just as single IP address. Apache could serve up the correct site based on the site name in the request passed in by the browser. Let's see how this works.
Back in the httpd.conf file, toward the bottom of the file, you'll find "Section 3: Virtual Hosts". All of it is commented out. Let's change that.
Uncomment the line that says "NameVirtualHost *" (just get rid of the "#"). This turns on virtual serving for all IP addresses that Apache listens to. If you have multiple IP addresses, you could specify only one of them here if you'd like.
Next, we need to come up with a web site, separate from our main site. Create a new directory to hold this site. I like "/www", and that's what my examples will use. After you create "/www", create another subdirectory called "virt1". Inside of that directory, we're going to create two more: "htdocs" and "logs". These last two directories should sit at the same level in the hierarchy. Once that's done, create an html index file to sit in 'htdocs' directory. One line will suffice ("This is a virtual site"). Now, back to the httpd.conf file.
We tell Apache about a virtual site by using the "VirtualHost" opening and closing directives. Here's one for our test site, that I'll comment on after we add it to httpd.conf, right after the "NameVirtualHost" line that we just uncommented:
ServerAdmin webmaster@virt1 DocumentRoot /www/virt1/htdocs ServerName virt1 ErrorLog /www/virt1/logs/errorlog CustomLog /www/virt1/logs/access_log combined
You should recognize everything that went in between the opening and closing VirtualHost tags. The "ServerName" line is what Apache uses to match the request. If you were setting up a real host for use on the Internet, you'd place the server's FQDN here (such as "www.example.com"). The cool thing about a virtual host is that almost anything you can put in the main server config, you can put in a virtual host. The values you supply in the virtual host block override the main site config and apply only to that virtual host. It truly becomes it own, separately functioning site.
There are two more things we need to do to get this to work for us. First, since we're set up for name based virtual hosting (and not IP based hosting), we need to access the server by name. This can be achieved through DNS, or more easily on our local workstation, through altering NetInfo (which OS X consults when trying to resolve a name). DNS and name lookups will be covered in a future column. If you're not familiar with editing NetInfo or how name lookups work, just follow along.
Open up NetInfo Manager, which is found in /Applications/Utilities. Click on the lock to authenticate, and be able to make changes. Navigate down to /machines/localhost as shown in figure 5.
Figure 5 - NetInfo Manager
with /machines/localhost selected, and its properties in the lower pane.
With localhost selected, click the duplicate icon in the toolbar, confirm your choice, and then click on the 'localhost copy' that you just created. In the lower pane, double-click on the 'name' property, and change the value to 'virt1'. This way, when we ask OS X to resolve the name 'virt1', it'll find the name first in NetInfo, and hand us back the IP address of '127.0.0.1', which is our own machine. Our browser, conforming to the http 1.1 specs, passes off the name of the site we're looking for to the web server.
Second, we need to restart Apache to have it read our new additions to httpd.conf. In the terminal, issue an 'apachectl restart' command, and you're ready to go. Of course, you can always restart Apache via the Sharing pref pane also.
OK - ready to test. In your web browser, enter the URL "http://virt1" and go! You'll see something like the display in figure 6.
Figure 6 - The Completed
Virtual Site.
To 'prove' that this is a virtual site, enter the URL "http://127.0.0.1" in your browser. You should be looking at the index file from /Library/WebServer/Documents again. Also, if you look in /www/virt1/logs, you'll see that two log files were created for this site. Again, they're completely separate from the logs stored for the main site. Bliss.
That's really it. That's not complicated, right? You can continue to add VirtualHost blocks to the httpd.conf file that serves up separate sites. Brilliant, huh? Just be aware that, before you do this, virtual hosts is another area that allows Apache to use up more resources. On a site that has many, many virtual hosts, a webmaster may choose to keep one master log file, and separate out the individual entries later on. This saves on file handles. There are more tweaks like this that can make Apache behave in a certain way. Or, you can tune your OS to raise the limits that are allowed. Either way, it's something you must be aware of.
There's a big benefit to anyone doing web development here: you can set up a virtual host for each site that you develop for that exactly matches the environment of the server you develop for. I use this on a daily basis. I develop for both Internet facing sites on ISP hosted servers, and Intranet sites that are hosted in-house and are only reached via a local LAN. But I do all of the prototyping and testing on my PowerBook. Each site that I work for has a VirtualHost block in my Apache config. This way, I can code and test before I upload the file to the real server. More often than not, the target server is running Apache on Solaris or Linux, but I can have the equivalent server on my Mac.
For more info on virtual hosts, see the Apache documentation at httpd://http.apache.org/docs/vhosts/.
htaccess
The .htaccess mechanism is very interesting. It has both its pros and cons, but lets you do things that couldn't be done without it. The filename ".htaccess" can be customized, as set by the "AccessFileName" directive. In all honesty, most of the time there is very little reason to change this. If you do change it, you need to change some other areas of your httpd.conf file to make sure you don't unwittingly serve the access file up to a client. For this discussion, I'm simply going to refer to this whole mechanism as 'htaccess'.
What is htaccess? The htaccess file is simply a text file, in which you can place Apache directives, just like a mini httpd.conf file. You can place this htaccess file in any directory or subdirectory of your site, and those directives will apply to just that directory. Now, not every directive that works in httpd.conf will be available in an htaccess file. The difference is that, unlike httpd.conf, which is read only at startup time, the file named in the AccessFileName directive is consulted each time that directory is accessed. In actuality, it's a little more complex than that.
Apache, by default, searches all directories above the one being accessed to see if an htaccess file applies. If, for example, someone requests http: //virt1/testfiles/file.html that is located on the file system at /www/virt1/htdocs/testfiles, Apache will search:
/.htaccess /www/.htaccess /www/virt1/.htaccess /www/virt1/htdocs/.htaccess /www/virt1/htdocs/testfiles/.htaccess
This searching takes Apache some time. Of course, you can turn this change this functionality with an AllowOverride directive. Try this in httpd.conf:
<Directory /> AllowOverride none </Directory>
What then, in reality would anyone use htaccess for? It comes in exceptionally handy when combined with virtual hosts, or anytime you have multiple people responsible for different content served by a single web server (like individual user pages). Speaking from experience, I once needed to set up two sites for a single company. Each site showcased a different side of the company. One of the sites was a more conservative than the other. While both sites were different, they both shared some common elements (like large QuickTime files). The goal became having two separate sites that could share this content. However, we didn't want users of one site to 'discover' the other site. The solution? Virtual hosts with the same DirectoryRoot directives. Each virtual host used a different index file. Additionally, we used .htaccess files to limit access to what each site had access to from the other site.
Also, many people simply associate htaccess with the ability to password protect directories and files. Sure, it has the ability to do that, but you can do that from httpd.conf also. Just be aware that it's not a unique property of htaccess to do this.
If you inherited some content that needed to be integrated with your site, but it perhaps came from a Windows environment where the extension '.htm' is popular, you could copy the site as is and drop it in a subdirectory of your site. We'll further pretend that other files have 'index.htm' hardcoded into them, and it would be too time-consuming to change them all before your deadline hits. Add an .htaccess file with the single line DirectoryIndex index.htm into that directory. This way, for this directory only, your web server will find index.htm as a valid index file, and hand it to a browser when the directory itself is requested.
There are also Apache directives that rewrite or redirect the requested URL. If you move a subdirectory, but still would like a reference to it (perhaps pages out of your control point there), you could drop a line like this in an htaccess file:
Redirect permanent /originaldirectory http://virt1/newdirectory
Use of the 'permanent' keyword also returns an http 301 permanently moved code. You can also rewrite incoming URLs to add or subtract all or part of the URL. If you have a subdirectory that should always be accessed over https, you can rewrite the URL. If the directory is on our virt1 site as http://virt1/protected, drop this in the directory as an htaccess file:
RewriteEngine on RewriteCondition %{SERVER_PORT} !^443$ RewriteRule ^(.*)$ https: //virt1/protected/$1 [R=301,L]
If a browser makes a request on a port other than 443, we're going to catch that and rewrite the URL as an https:// URL.
There are many, many other possibilities. htaccess just adds to the immense flexibility of Apache.
PHP
PHP is not Apache, nor part of the distribution, but I mention it here for two reasons: one, with OS X Panther, it's included and turned on by default and, two, almost everything I touch web-wise has some PHP component so for me, the two have become inextricably linked. Of course, there are other ways to serve up dynamic content, and OS X has all the goodies you want, including perl and mod_perl (but unlike PHP, this one still has to be activated by you).
If you never plan to do anything with PHP, but want to run several virtual hosts and expect a fair amount of traffic, unload it. Just comment out the "LoadModule php4_module libexec/httpd/libphp4.so" and "AddModule mod_php4.c" lines (these are two separate lines in two different places). This will save a fair amount of memory per httpd instance.
The ability to run PHP opens up incredible new possibilities to run many of the free and open source programs that are available. Just be aware of this, though: Using the guidelines in this article and from other official sources like the Apache web site, understand how your security is impacted by these applications. Oh, I know how it starts. You set up a test web server on the network. You're testing a new open source app you've found, and it meet 90% of your needs. However, when you installed it, it asked you to make some changes to your httpd.conf and php.ini files. Perhaps even some changes to the permissions of files on disk. You think, "this is a test machine, I can make these changes without repercussion." Then it happens. You show someone at work the web app. They say it's great, and tell someone else. Before you know it, you're being asked to open up the application for use in a small department. Or the entire company. And you have a deadline. Do you go back and investigate the security of the site, or do you just get it into production?
New to PHP? See Dave Mark's Getting Started column. His current focus just happens to be PHP.
Troubleshooting
What do you do when Apache won't run? Or isn't giving you the results you're expecting? Never panic. There are a few tools we can use to investigate the problem.
First and foremost are the logs. In most cases, they are the best source to figure our what's happening. If you're receiving a message that Apache "can't bind to port...", make sure you're not trying to run two separate copies of Apache that bind to the same port. Failing that, make sure nothing else is running on that port. Use the netstat command in terminal to find out (netstat -an | grep LISTEN).
Is it plugged in? I've dealt with issues where people thought Apache was running, but it wasn't. The complaint was usually, "Apache isn't listening on the right port!" Or, "something is blocking me from getting to the web server." In many cases, people didn't realize that a syntax or other small error stopped Apache from running in the first place. Make sure it's running: use 'ps ax | grep httpd' in Terminal, or fire up Activity Monitor (make sure you select 'All Processes' from the drop down, though).
Any time you make a change to your httpd.conf file, but before restarting Apache to honor the change, you can syntax check your config file. In terminal, try 'httpd -t'. This will syntax check the configuration files. It's a nice way to catch silly errors.
Virtual hosts not doing what you expect? Try 'httpd -S' (S must be capitalized). This shows the configuration as seen by Apache. You get a listing like this:
VirtualHost configuration: wildcard NameVirtualHosts and _default_ servers: *:80 is a NameVirtualHost default server virt1 (/etc/httpd/httpd.conf: 1056) port 80 namevhost virt1 (/etc/httpd/httpd.conf:1056) port 80 namevhost 127.0.0.1 (/etc/httpd/httpd.conf:1066) port 80 namevhost radiotope (/etc/httpd/httpd.conf:1098) port 80 namevhost p2 (/etc/httpd/httpd.conf: 1166) port 80 namevhost mw (/etc/httpd/httpd.conf: 1177)
We're given the default server plus each of the virtual servers that Apache is parsing. You're also told which file and line number that Apache is finding that information from.
In the rare instance you're experiencing a hard crash, strip your httpd.conf file back down to the basics, and add your modifications in one line (or at least a small chunk) at a time. The only time I've ever seen Apache die a hard death was due to a 'third party' module being compiled or linked in. You might get some indication in the log as to what's happening before Apache dies.
Conclusion
I hope that this article has made you want to dig into Apache a little deeper. No installation is necessary, and you already own the tools to modify it to your liking. There's also a lot more to explore, as space only permits us to cover the basics here.
If you're serious about maintaining a web server that talks to the world, this article is a good starting point. Past this, you owe it to yourself to do three things: 1) Read the Apache documentation at http://httpd.apache.org/docs/ (yes, there's a lot there), 2) Buy the O'Reilly 'horse' book (Apache), now up to its 3rd edition and, most importantly, 3) set up a server a fiddle with it. Nothing is more important than hands on experience. Even if you have to use a cast-off G3, just get your feet wet. You'll soon be swimming.
There's a reason Apache is the number one web server on the planet: it's stable, secure, fast and flexible. Nicest of all? It's built into your Mac. Go press it into service.
Ed Marczak has been involved with technology since his Atari 2600 broke and decided
to make the repairs himself. He finds the 'about the author' box the hardest part of the article to
write. His technology time is often spent at http://www.radiotope.com
Warning: include(/home/cust10011/www/site001/includes-mactech/includefiles/mt_footer.inc) [function.include]: failed to open stream: No such file or directory in /home/cust10011/www/site001_files/staticcontent/articles/mactech/Vol.21/21.01/TheWebServerInOSX/index.html on line 818
Warning: include() [function.include]: Failed opening '/home/cust10011/www/site001/includes-mactech/includefiles/mt_footer.inc' for inclusion (include_path='.:/usr/share/php:/usr/share/pear') in /home/cust10011/www/site001_files/staticcontent/articles/mactech/Vol.21/21.01/TheWebServerInOSX/index.html on line 818