The daemontools package by Daniel J. Bernstein contains the multilog program that does not lose logging information and does not accidently eat up all your disk space. I use multilog for the Apache- and Zope-logging in a virtual server environment.
I have many virtual servers mapped to a Zope instance.
Logging accesses to many virtual Apache servers eats many file descriptors,
memory and CPU power if you run a logger for every server. That's why I use only
one log which includes the virtual hostname and the port. The hostname is
used to map the log entries to the virtual server later.
Multilog is used to
get an exact timestamp in external TAI64N
format at the beginning of each line. The major advantage of multilog is
that it rotates logfiles based on size. It keeps only a fixed number of logfiles so
that you never will see your disk get filled by growing logfiles.
Here is an example (the long line is wrapped, continuation lines are
marked with the plus sign):
@40000000393975c730047524 fte.tegtmeyer.net 80 195.145.137.253 - "GET /zope/logging +HTTP/1.0" 200 6794 "-" "Mozilla/4.72 [de] (X11; I; Linux 2.2.15 i586)" |
Multilog has it's own built in filter capabilities but I decided to split the logging itself from analyzing. So the filtering for virtual hosts is done during periods of low activity by an additional script.
This article assumes Linux. If you are running a BSD variant or another system you probably have problems with the scripts. In this case install and use the GNU tools instead of the native ones (for example gsed instead of sed on FreeBSD, thanks to Andreas Brenk!).
The multilog program is part of the daemontools package made by Daniel J. Bernstein which you can download from his webserver.
The installation is straightforward:
The multilog program expects the data at standard input. The Apache webserver is able to specify a logger which uses a pipe to a logger program. It is necessary to log two streams: the error log and the access log. To log to the two directories /web/logs/access and /web/logs/error, you would need the following entries in the common area of the Apache httpd.conf file. It assumes that logging is done under the httplog user:
ErrorLog "| /usr/local/bin/setuidgid httplog /usr/local/bin/multilog
+s4194304 n10 /web/logs/error"
LogFormat "%v %p %h %u \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\""
+combined
CustomLog "| /usr/local/bin/setuidgid httplog /usr/local/bin/multilog t
+s4194304 n50 /web/logs/access" combined
|
Note the changed logging format: the timestamp of the Apache webserver is removed because of it's horrible format and insufficient accuracy. The line is prepended with an external TAI64N timestamp (provided by the t switch of multilog), the name of the virtual host and the port the client is accessing. Also removed is the column for results of ident-lookups. This column was not used on any webserver I have ever seen. What counts for a webserver is speed, so nobody turns on ident-lookups for incoming requests. If you have a need for ident information, change the format of the logfile and the scripts mentioned later in this article.
A mail from Jamie Heilman made me aware of a discussion on the damontools
mailinglist about problems using
multilog from Apache on FreeBSD. You may have a look at the
archive. The discussion was
in June and July of 2001. You have to look for the subjects patch to have
apache run supervised and apache, mulitlog, & fifo
trick redux.
To solve the problem you may use the same technique as described here for
Zope: let Apache log to a named pipe and let multilog read from it. Be sure that
the loggers are up when Apache starts.
In a virtual server environment with Apache the logging provided by Zope is mostly important for debugging because all requests are also logged by Apache. So I wanted to have a Zope access log, but only for a limited time and of course limited size.
Zope includes hooks for different logging mechanisms. A logger that rotates logfiles is also available but it is necessary to edit the Zope sources to activate another logging mechanism. Because multilog does all what we need, we looked for a solution to pipe the logged data to a program. Luckily we use Unix instead of Windows NT, so there is a simple solution without fiddling with the sources of zope.
The standard setup of Zope simply appends all log data
to the File Z2.log in the var
subdirectory. If we create a named pipe with this name
Zope will happily log to it - all we have to do now is to ensure that
the multilog program listens at this named pipe.
Here are the commands that are used to create the named pipe.
We assume Linux and that Zope runs under the user id zope. The multilog
program runs under the group id httplog:
cd var rm Z2.log mkfifo Z2.log chmod 240 Z2.log chown zope.httplog Z2.log |
There are many ways to ensure that multilog listens at the pipe. Because we are monitoring Zope with daemontools, it is natural to start the logging with daemontools too. See my document Monitoring Zope with Daniel J. Bernsteins daemontools for more information.
Simply set up an additional service (called zopelog at our site) that starts the multilog program. The run script looks like that:
#! /bin/sh exec < /web/zope/var/Z2.log exec /usr/local/bin/setuidgid httplog \ /usr/local/bin/multilog t n5 /web/logs/zopeaccess |
Of course the /web/logs/zopeaccess directory must exist and has to be writable by the user httplog:
mkdir /web/logs/zopeaccess chown httplog /web/logs/zopeaccess chmod 700 /web/logs/zopeaccess |
The access log of Zope is not the only output that Zope generates. There
is also the output that is displayed by the start script. We decided to log
this too to have an idea how often, why and when Zope is restarted
because of a crash. The output goes to stdout and
this is the standard situation the daemontools are designed for.
The svscan program sets up a logger for a service if
The run script for this setup (located in the log subdirectory of the Zope service):
#! /bin/sh exec /usr/local/bin/setuidgid httplog /usr/local/bin/multilog t /web/logs/zopedebug |
To set up all the necessary files use service directories under the Zope directory. I already prepared the files for downloading. To activate the services you link them to the /service directory (or the one svscan runs on). Don't place them directly in the /service directoy - svscan may be confused by intermediate states when you are creating or changing files:
ln -s /web/zope/service_zopelog /service/zopelog ln -s /web/zope/service_zope /service/zope |
Watching the logfiles has two problems. First it is not very useful to watch
the timestamps in TAI64N. Second it is possible that multilog switches the
logfile while watching.
I use the GNU version of tail that has an option
to recognize a switched file. The conversion of the TAI64N timestamps is
done by Dan Bernsteins utility tai64nlocal.
The solution:
tail --follow=name /web/logs/zopeaccess/current | tai64nlocal on FreeBSD: tail -f --follow=name /web/logs/zopeaccess/current | tai64nlocal |
If you reallly are not able to modify the parser of a logfile analyzer to
recognize the created format you may use the following tools and
commands. It's lazy programming, so don't use it for learning Perl or something else!
You may download the scripts.
Let's say you want the converted logfile for the webserver www.fte.to on port 80, you may issue the following commands inside of the logfile directory:
cat ./\@* current | fgrep ' www.fte.to 80 ' | \
./filter | ./iso_to_weird > fte-log
|
The functionality of filter and iso_to_weird is divided to gain more flexibility.
The filter script:
#!/bin/sh
sed 's/\([^ ]\+ \+\)\([^ ]\+ \+\)\([^ ]\+ \+\)\([^ ]\+ \+\)/\4\1\2\3/' \
| logresolve \
| sed 's/\([^ ]\+ \+\)\([^ ]\+ \+\)\([^ ]\+ \+\)\([^ ]\+ \+\)/\2\3\4\1/' \
| /usr/local/bin/tai64nlocal \
| sed 's/\([^ ]\+ \+\)\([^\.]\+\)\([^ ]\+ \+\)\(.*\)/\1\2 \4/'
|
The first line of the script moves the ip address of the accessing host
to the beginning of the line because the logresolve
program (ond others) does expect it there. After piping through logresolve the
ip address or name is moved back to the old place.
The TAI64N timestamp
at the beginning of the line is converted to
ISO format (including fractional seconds) by using Daniel J.
Bernstein's tai64nlocal program.
The last line of the script then removes the fractional seconds.
At first it seemed expensive to do it this way but practice showed that the conversion has nearly no performance impact.
The iso_to_weird script:
#!/usr/bin/perl
@months = ('Jan','Feb','Mar','Apr','May','Jun','Jul',
'Aug','Sep','Oct','Nov','Dec');
while (<>) {
($date,$time,$server,$port,$from,$user,@rest) = split / /;
($year,$month,$day) = split('-',$date);
$month--;
$month=$months[$month];
$rest = join (' ',@rest);
print "$from - $user [$day/$month/$year:$time] $rest";
}
|
This Perl-Script parses the ISO time and generates the format that many logfile analyzers do expect. The server and port are removed from the log and the useless field for identd lookups is inserted again. The rest of the line (request, return code, referer etc.) is kept like it comes in.