NYCPHP Meetup

NYPHP.org

[nycphp-talk] OT: Apache access_log integrity

Hans Zaunere hans at nyphp.org
Thu Oct 2 09:35:30 EDT 2003


D C Krook wrote:

> Folks,
> 
> I'm trying to trim the fat from Apache's access_log: removing my own IP 
> addresses from the log; stripping referer spam; bots; etc.
> 
> While I understand that I can exclude known IP addresses and other 
> common patterns via mod_setenvif, I'd like to be able to do this on an 
> ad hoc basis when I notice certain spikes in useless records in the log 
> and/or when my IP changes when hitting my own site from various wireless 
> points.
> 
> My first idea was to grep the logs by using a shell or Perl script that 
> I could add to my daily cron or call arbitrarily like so:
> 
> #!/bin/sh
> grep -v "192.168.1.1" /var/log/apache/access_log > 
> /var/log/apache/access_log.tmp
> mv /var/log/apache/access_log.tmp /var/log/apache/access_log
> /export/home/krook/bin/restart-apache.sh
> 
> Of course, this has the drawback of restarting Apache everytime the 
> access_log is changed by the script, but the second or two of down time 
> is acceptable if it means logs that can be regularly analyzed for useful 
> reports that don't have "http://jeff-knights-online-viagra-megastore" as 
> my top referer.
> 
> I'd like to know if anyone else had addressed this problem in a 
> sucessful way or has any best practice, either via mod_setenvif, Perl, 
> CLI PHP, cron etc.

I haven't had to do this type of thing exactly, but an idea that just popped in there:  CustomLog can take a pipe instead of a flat file (see http://httpd.apache.org/docs/mod/mod_log_config.html#customlog for syntax).  For example:

CustomLog |speciallogger.psh

where specialogger.psh is a PHP CLI, of course :)  specialogger.psh could then have some scheme to ignore certain types of lines that come in on stdin, or log them in some preferred way (via mysql,file,etc).  It could also speak to a DB, which would determine what lines are ignored and which are not (maybe just via a set of regexs).  There's a lot of possibilities, and would especially be handy for sites without huge amounts of traffic (although I could see it tuned for even those with a lot of traffic).

> I've Googled the following topics without any good results:
> "strip line from access_log perl"
> "clean access_log"
> "setEnvIf"
> "eliminate referer spam access_log"
> "remove line from log"

A good man is he who reveals what he's googled for  :)

HTH,

H






More information about the talk mailing list