[nycphp-talk] OT: Apache access_log integrity
D C Krook
dkrook at hotmail.com
Thu Oct 2 00:59:29 EDT 2003
Folks,
I'm trying to trim the fat from Apache's access_log: removing my own IP
addresses from the log; stripping referer spam; bots; etc.
While I understand that I can exclude known IP addresses and other common
patterns via mod_setenvif, I'd like to be able to do this on an ad hoc basis
when I notice certain spikes in useless records in the log and/or when my IP
changes when hitting my own site from various wireless points.
My first idea was to grep the logs by using a shell or Perl script that I
could add to my daily cron or call arbitrarily like so:
#!/bin/sh
grep -v "192.168.1.1" /var/log/apache/access_log >
/var/log/apache/access_log.tmp
mv /var/log/apache/access_log.tmp /var/log/apache/access_log
/export/home/krook/bin/restart-apache.sh
Of course, this has the drawback of restarting Apache everytime the
access_log is changed by the script, but the second or two of down time is
acceptable if it means logs that can be regularly analyzed for useful
reports that don't have "http://jeff-knights-online-viagra-megastore" as my
top referer.
I'd like to know if anyone else had addressed this problem in a sucessful
way or has any best practice, either via mod_setenvif, Perl, CLI PHP, cron
etc.
I've Googled the following topics without any good results:
"strip line from access_log perl"
"clean access_log"
"setEnvIf"
"eliminate referer spam access_log"
"remove line from log"
Thanks in advance for any tips.
-Dan
_________________________________________________________________
MSN 8 helps eliminate e-mail viruses. Get 2 months FREE*.
http://join.msn.com/?page=features/virus
More information about the talk
mailing list