[nycphp-talk] How would you do this ?
Rob Marscher
rmarscher at beaffinitive.com
Mon Sep 25 11:37:06 EDT 2006
Definitely only parse each feed once across the server (not once for
each user). I'm sure that would cut down your number a lot from
200,000. You should figure out how much processing time it takes to
parse a feed. I wouldn't think it would be all that much. If it
doesn't take too long for your code to parse a feed, you should just do
it on demand.
i.e. - when the user checks their account, loop through their feeds,
determine if the last time you parsed the feed was longer than xx amount
of time (like a half hour or hour or something like that) and then
determine if any of those feeds have changed (maybe by comparing the
file size of the live version with a cached local copy). For the ones
that have changed, pull down the new content, and mark the current time
as the last updated time for the feed. I would model feed entries into
a database table for easy sorting, searching and other stuff like that.
In terms of the user interface to deal with this possible wait time in
updating the feeds, you could show the user the latest cached version of
the feed and then do an ajax call to do the update.
This way of doing it would avoid parsing feeds that no one accesses and
also avoid having to predict your user's activity.
-Rob
Jad madi wrote:
> I'm building an RSS aggregator so I'm trying to find out the best way to
> parse users account feeds equally so Lets say we have 20.000 user with
> average of 10 feeds in account so we have about
> 200.000 feed
>
> How would you schedule the parsing process to keep all accounts always
> updated without killing the server? NOTE: that some of the 200.000 feeds
> might be shared between more than one user
>
> Now, what I was thinking of is to split users into
> 1-) Idle users (check their account once a week, no traffic on their RSS
> feeds)
> 2-) Idle++ (check their account once a week, but got traffic on their
> RSS feeds)
> 2-) Active users (Check their accounts regularly and they got traffic on
> their RSS feeds)
>
> NOTE: The week is just an example but at the end it’s going to be
> dynamic ratio
>
> so with this classification I can split the parsing power and time to
> 1-) 10% idle users
> 2-) 20% idle++ users
> 3-) 70% active users.
>
> NOTE: There is another factors that should be included but I don’t want
> to get the idea messy now (CPU usage, Memory usage, connectivity issues
> (if feed site is down) in general the MAX execution time for the
> continues parsing loop shouldn’t be more than 30 minutes 60 minutes)
> Actually I’m thinking of writing a daemon to do it “just keep checking
> CPU/memory” and excute whenever a reasonable amount of resource
> available without killing the server.
>
>
> Please elaborate.
>
> _______________________________________________
> New York PHP Community Talk Mailing List
> http://lists.nyphp.org/mailman/listinfo/talk
>
> NYPHPCon 2006 Presentations Online
> http://www.nyphpcon.com
>
> Show Your Participation in New York PHP
> http://www.nyphp.org/show_participation.php
>
More information about the talk
mailing list