NYCPHP Meetup

NYPHP.org

[nycphp-talk] Robot Sessions

Chris Shiflett shiflett at php.net
Mon Mar 29 17:47:32 EST 2004


--- Jim Musil <jmusil at villagevoice.com> wrote:
> Is there a (good) way to allow robots to navigate your site, but
> not issue them a session?

Nope, because a robot doesn't have to let you know that it's a robot.

However, you can identify the major ones, because they pass a consistent
User-Agent identifying themselves. Here are some agents from my access
logs that look like robots to me (some are truncated):

Googlebot/2.1 (+http://www.googlebot.com/bot.html)
FeedDemon/1.10 RC 1 (http://www.bradsoft.com/; Microsoft Wind
PhpDig/1.8.0 (+http://www.phpdig.net/robot.php)
FeedDemon/1.0 (http://www.bradsoft.com/; Microsoft Windows XP
NetNewsWire/1.0.5 (Mac OS X; Lite; http://ranchero.com/netnew
Bloglines/2.0 (http://www.bloglines.com; 6 subscribers)
msnbot/0.11 (+http://search.msn.com/msnbot.htm)
FOSS (Free and Open Source) Planet Planet/0.2 http://www.plan
QuepasaCreep ( crawler at quepasacorp.com )
NewsGator/2.0 (http://www.newsgator.com; Microsoft Windows NT
Radio UserLand/8.0.8 (MacOS)
FAST-WebCrawler/3.8 (crawler at trd dot overture dot com; htt
ia_archiver
Straw/0.22.1
n4p_bot crawler at n4p.com
WebSauger 1.20b
http://www.almaden.ibm.com/cs/crawler [c01]
FeedOnFeeds/0.1 (+http://minutillo.com/steve/feedonfeeds/)
ES.NET_Crawler/2.0 (http://search.innerprise.net/)
http://www.almaden.ibm.com/cs/crawler [wf85]
NutchCVS/0.03-dev (Nutch; http://www.nutch.org/docs/en/bot.ht
WebReaper [info at webreaper.net]

OK, there are more than I thought, so I'll stop there. :-)

Hope that helps.

Chris

=====
Chris Shiflett - http://shiflett.org/

PHP Security - O'Reilly
     Coming Fall 2004
HTTP Developer's Handbook - Sams
     http://httphandbook.org/
PHP Community Site
     http://phpcommunity.org/



More information about the talk mailing list