[nycphp-talk] Best way to accomplish this task
Andy Dirnberger
dirn at dirnonline.com
Sun Feb 14 20:56:12 EST 2010
On Sun, Feb 14, 2010 at 8:49 PM, Anthony Papillion <papillion at gmail.com> wrote:
>
> Hello Everyone,
>
> I'm designing a system that will work on a schedule. Users will submit data for processing into the database and then, every minute, a PHP script will pass through the db looking for unprocessed rows (marked pending) and process them.
>
> The problem is, I may eventually have a few million records to process at a time. Each record could take anywhere from a few seconds to a few minutes to perform the required operations on. My concern is making sure that the script, on the next scheduled pass, doesn't grab the records currently being processed and start processing them again.
>
> Right now, I'm thinking of accomplishing this by updating a 'status' field in the database. So unprocessed records would have a status of 'pending', records being processed would have a status of 'processing' and completly processed record will have a status of 'complete'.
>
> For some reason, I see this as ugly but that's the only way I can think of making sure that records aren't duplicatly processed. So when I select records to process, I'm ONLY selecting one's with the status of 'pending' which means they are new, unprocessed.
>
> Is there a better, more eleqent way of doing this or is this pretty much it?
>
> Thanks!
> Anthony Papillion
I would generate a unique value for each pass (whether that's an auto
increment field in a table, a UUID, or something else is up to you). I
would add a field to your table(s), just like you were going to do
with status, to store this value.
When processing, do something along the lines of UPDATE table SET
new_field = unique_value WHERE new_field IS NULL (make sure new_field
is indexed). Then you process any records with a matching value in the
new field. This also has the benefit that, should a pass fail, you can
identify which records were part of that pass.
More information about the talk
mailing list