[nycphp-talk] Curl & Traversing Pages
Rolan Yang
rolan at omnistep.com
Tue Nov 22 21:46:02 EST 2005
A great tool for debugging bots and spiders is the "Tamper Data 0.85"
extension of Firefox browser. Go download the browser, then install the
extension, which can be found here:
https://addons.mozilla.org/extensions/showlist.php?application=firefox&category=Developer%20Tools&numpg=10&pageid=7
It logs and displays all inbound and outbound traffic to the browser.
This is very useful, especially when creating bots that interface with
SSL pages. I used to use a packet sniffer to debug, but having to tackle
an SSL only application prompted me to seek out and discover this
wonderful app.
An alternative way to spider a website is to grab the pages with "wget"
then parse and process it all offline later. One caveat: some dynamic
scripts may generate links on the fly resulting in loops. Googlebot, for
example, has been endlessely spidering my same photo album pages for the
past year and a half.
~Rolan
Joseph Crawford wrote:
>Hello Everyone,
>
>let me explain a bit what i am trying to do. I have a script that
>will grab the first page which i specify from a URL such as
>
>http://yellowpages.superpages.com/listings.jsp?PS=45&OO=1&R=N&PP=L&CB=1&STYPE=S&F=1&L=VT&CID=00000518939&paging=1&PI=0
>
>
>
More information about the talk
mailing list