[nycphp-talk] PHP And Search Engines
inforequest
1j0lkq002 at sneakemail.com
Wed Sep 29 22:58:36 EDT 2004
Daniel Convissor danielc-at-analysisandsolutions.com |nyphp dev/internal
group use| wrote:
>On Wed, Sep 29, 2004 at 04:22:12PM -0400, Joseph Crawford wrote:
>
>
>>if there is an easy way without use
>>apache's mod_rewrite to make a search engine index pages that have
>>querystrings such as
>>product.php?id=57
>>
>>
>
>It seems you're in search of an answer to a problem that doesn't
>exist. I know Google searches pages with query strings. I'd guess
>most of the others do as well.
>
>--Dan
>
There are some very specific and good reasons to eliminate url's with
query strings. Getting spidered and indexed just isn't one of them :-)
That is why I suggested you have clear objectives. You don't simply want
to get rid of query string URLs in order to get indexed. You may however
want to get rid of complex, low-readability URLs in order to better
communicate your content message to the masses (visitors as well as
spiders).
As Chris Shiflett highlighted already,
www.yoursite.com?id=456&prod_id=564 is not nearly use "useful" to
anyone as something like www.yoursite.com/cars/hover_cars/index.html
The search engines will reward you for improved usability like that (not
just index you, but rank you higher for keywords associated with cars,
hover cars, and your site's known themes). In addition you are more
likely to get bookmarked, and your bookmarks will make better sense in
your traffic logs.
Additionally, Search Engines may or may not consider
www.yoursite.com?id=456&prod_id=564 and
www.yoursite.com?id=322&prod_id=567 as different pages. This is an
emerging research area, but it seems that Google is considering them to
be 2 variants of the same page. At first this may not be important as
long as they both appear in the index. However, since the index is a
finite resource and the SEs love to keep it *all* in the shared memory
of their server clusters at all times (for speed), it is inevitable that
SEs will have to prioritize on inclusion of pages. It is reasonable to
think variants of the same page will be demoted early in that process.
It is also reasonable to think this is done already, at several levels.
There are PageRank issues associated with this as well, even less
understood. Again much of this stuff is on the edge.... issues like
semantic URLs being better than cryptic query strings - that is much
more concrete.
There are empirical data (based solely on observations) that suggest a
site structure with one level is prefered over one with deep
directories. That means a site with www.site.com/small_cars.html and
www.site.com/large_cars.html is preferred over one with
www.site.com/cars/small/index.html and www.site.com/cars/large/index.html.
Similarly there is large body of empirical knowledge on search engine
preferences, for each engine and certain conditions. They are all
important in optimizaton and when competing, but many are also very
important just for search engine friendliness.
For example it is pretty clear that www.site.com/index.html is "better"
than www.site.com/index or www.site.com/index/. The SE bots are seeking
out *pages* (called documents) and love plain and simple HTML. In
general the more you look simple clean (and valid!) HTML the better they
will treat you. It is especially important when you are trying to get
the spider to flag your site for deep crawling. It appears "difficult"
sites are defered for deep crawling compared to "easy" sites, all other
things being equal. Some people see a deep crawl and others see frequent
visits for small batches of pages.... there is an algorithm at work there.
By the way Flash is indeed spidered and indexed. You have to make it
easy for the bots semantically. Take a look at a Google search for .swf
and you'll see many sites are listed and indexed with good snippets,
while others are only listed, and some are indexed with "poor" snippets.
There are even some advantages to be gained from using Flash.... I could
tell you some tricks, but then I'd have to.... ;-)
The proposed NYPHP talk in January is alot of this kind of stuff,
specifically tailored to the PHP programmer and overall technical
webmaster of a site. It is my perspective that with a certain level of
awareness of the issues, the PHP programmer/site designer can take
certain steps early on which make it much easier for search engine
specific improvements to be made later. In my work I have to break the
bad news to site programmers and designers after the site has been built
-- that it needs an overhaul or needs to be rebuilt. I would prefer to
be more popular than I am ;-) Alot of that pain would go away if
certain things were considered at design and construction time.
-=john andrews
More information about the talk
mailing list