Basic network access: servers
Virtual hosts
Running and maintaining a web server is enough work that you might want to use the same server to host several sets of web pages, for example for a number of different organizations. apache calls this feature virtual hosts, and it offers a lot of support for them. Theoretically, all your hosts can be virtual, but the configuration file still contains additional information for a "main" server, also called a "default" server. The default configuration does not have any virtual servers at all, though it does contain configuration information.
There's a good reason to keep the "main" server information: it serves as defaults for all virtual hosts, which can make the job of adding a virtual host a lot easier.
Consider your setup at http://example.org: you may run your own web pages and also a set of pages for http://biguser.com (see page 310). To do this, you add the following section to /usr/local/etc/apache/httpd.conf:
<VirtualHost *> ServerAdmin grog@example.org DocumentRoot /usr/local/www/biguser Where we put the web pages ServerName www.biguser.com the name that the server will claim to be ServerAlias biguser.com alternative server name ErrorLog /var/log/biguser/error_log TransferLog /var/log/biguser/access_log Options +FollowSymLinks Options +SymLinksIfOwnerMatch </VirtualHost>
If you look at the default configuration file, you'll find most of these parameters, but not in the context of a VirtualHost definition. They are the corresponding parameters for the "main" web server. They have the same meaning, so we'll look at them here.
- ServerAdmin is the mail ID of the system administrator. For the main server, it's set to you@your.address, which obviously needs to be changed. You don't necessarily need a ServerAdmin for each virtual domain; that depends on how you run the system.
- DocumentRoot is the name of the directory that will become the root of the web page hierarchy that the server provides. By default, for the main server it's /usr/local/www/data, which is not really a very good place for data that changes frequently. You might prefer to change this to /var/www, as some Linux distributions do. This is one parameter that you must supply for each virtual domain: otherwise the domain would have the same content as the main server. In this case, it's the location of the files in http://www.example.com/.
- Next you can put information about individual data directories. The default server first supplies defaults for all directories:
<Directory /> Options FollowSymLinks AllowOverride None </Directory>
The / in the first line indicates the local directory to which these settings should apply. For once, this is really the root directory and not DocumentRoot: they're system-wide defaults, and though you don't have to worry about apache playing around in your root file system, that's the only directory of which all other directories are guaranteed to be a subdirectory. The Options directive ensures that the server can follow symbolic links belonging to the owner. Without this option, symbolic links would not work. We'll look at the AllowOverride directive in the discussion of the .htaccess file below.
There's a separate entry for the data hierarchy:
<Directory "/usr/local/www/data"> Options Indexes FollowSymLinks MultiViews AllowOverride None Order allow,deny Allow from all </Directory>
In this case, we have two additional options:
- Indexes allows httpd to display the contents of a directory if no index file, with a name defined in DirectoryIndex, is present. Without this option, if there is no index file present, you will not be able to access the directory at all.
- MultiViews allows content-based multiviews, which we don't discuss here.
Note that if you change the name of the default data directory, you should also change the name on the Directory invocation.
We'll look at the remaining entries in more detail when we see them again in the discussion of the .htaccess file.
- Normally you should set ServerName. For example, www.example.org is a CNAME for freebie.example.org (see page 370), and if you don't set this value, clients will access www.example.org, but the server will return the name freebie.example.org.
- httpd can maintain two log files, an access log and an error log. We'll look at them in the next section. It's a good idea to keep separate log files for each domain.
- You should have a default VirtualHost entry. People can get quite confused if they select an invalid name (for example, http://www.big-user.com) and get the (default) web page for http://www.example.org. The default page should not match any other host. Instead, it should indicate that the specified domain name is invalid.
- For the same reason, it's a good idea to have a ServerAlias entry for the same domain name without initial www. The entry in the example above serves the same pages for http://www.biguser.com and http://biguser.com.
- The directive Options +SymLinksIfOwnerMatch limits following symbolic links to those links that belong to the same owner as the link. Normally the Options directive specifies all the options: it doesn't merge the default options. The + sign indicates that the option specified should be added to the defaults.
After restarting apache, it handles any requests to http://www.biguser.com with these parameters. If you don't define a virtual host, the server will access the main web pages (defined by the main DocumentRoot in entry /usr/local/etc/apache/access.conf).
Log file format
httpd logs accesses and errors to the files you specify. It's worth understanding what's inside them. The following example shows five log entries. Normally each entry is all on a very long line.
p50859b17.dip.t-dialin.net - - name of system, more [01/Nov/2002:07:06:12 +1030] date of access "GET /Images/yaoipower.jpeg HTTP/1.1" HTML command 200 status (OK) 19365 length of data transfer aceproxy3.acenet.net.au - - [01/Nov/2002:07:35:34 +1030] "GET /Images/randomgal.big.jpeg HTTP/1.0" 304 - status (cached) 218.24.24.27 - - system without reverse DNS [01/Nov/2002:07:39:55 +1030] "GET /scripts/root.exe?/c+dir HTTP/1.0" looking for an invalid file 404 284 status (not found) 218.24.24.27 - - [01/Nov/2002:07:39:56 +1030] "GET /MSADC/root.exe?/c+dir HTTP/1.0" 404 282 218.24.24.27 - - [01/Nov/2002:07:39:56 +1030] "GET /c/winnt/system32/cmd.exe?/c+dir HTTP/1.0" 404 292 218.24.24.27 - - [01/Nov/2002:07:40:00 +1030] "GET /_vti_bin/..%255c../..%255c../..%255c../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 404 323
The fields in the log file are separated by blanks, so empty entries are replaced by a - character. In this example, the second and third fields are always empty. They're used for identity checks and authorization.
To get the names of the clients, you need to specify the HostnameLookups on directive. This requires a DNS lookup for every access, which can be relatively slow.
Although we specified hostname lookups, the last four entries don't have any name: the system doesn't have reverse DNS. They come from a Microsoft machine infected with the Nimda virus and show an attempt to break into the web server. There's not much you can do about this virus; it will probably be years before it goes away. Apart from nuisance value, it has never posed any threat to apache servers.