NYCPHP Meetup

NYPHP.org

[nycphp-talk] OT: Standard Data Import Tool?

Matt Juszczak matt at atopia.net
Tue Dec 29 12:46:50 EST 2009


Hi Greg,

That's pretty much how we had things mapped out.  Basic source validation 
in the beginning, then a more standard validation later on.  I guess my 
question was - is there any open source project that would help with this 
work flow, such as handling exceptions in the case of bad data, etc. etc.?

-Matt

On Tue, 29 Dec 2009, Greg Rundlett (freephile) wrote:

> There are several issues all bundled up here.
>
> First you have to define the file format(s) that your feeds come in so
> that you can read them and isolate the data elements correctly.  Are
> they CSV, TSV, XML, custom?
> At the first (read) step you can do gross validation like validate the
> source for correct character encoding, record separators, record
> length etc.
> You also want to define "layouts" for various feeds.  By defining the
> layout, you can apply a set of processing instructions to transform
> the received layout into the desired layout.
> Once you have the data in the desired layout, you can then do finer
> validation such as per field rules.  E.g. last name must
> /(a-zA-Z'-){2,100}/
> Then you can write an import routine that determines the correct
> layout transformation to apply, reads the source, tranposes it,
> validates it, stores it in the database, logs the event
>
> hth,
>
> Greg Rundlett
>
> nbpt 978-225-8302
> m. 978-764-4424
> -skype/aim/irc/twitter freephile
> http://profiles.aim.com/freephile
>
>
>
>
> On Tue, Dec 29, 2009 at 12:28 PM, Matt Juszczak <matt at atopia.net> wrote:
>> Hi all,
>>
>> I need to write a tool that somehow takes multiple data feeds (some xls
>> files, some csv files, etc.), defines each format,
>> verifies/validates/sanitizes the data, and imports it into standardized
>> MySQL tables.
>>
>> As an example, one feed might look like this:
>>
>> First Name, Last Name, DOB
>>
>> And another like this:
>>
>> Name, DOB
>>
>> And another:
>>
>> DOB, Last Name, First Name, Middle Initial
>>
>> I need to somehow be able to say "For this source, this column needs to be
>> split into two, and this other column is the date of birth column".  Once
>> the data is parsed, it would need to be imported into standard MySQL tables:
>>
>> first_name, last_name, birth_date, etc.
>>
>> This is a simple example - the files and tables actually have 20-30 columns
>> each, so using mysqlimport or a simple shell script probably isn't the best
>> option.
>>
>> Are there open source tools out there that do this already?  If not, should
>> I just build something?
>>
>> Thanks,
>>
>> -Matt
>> _______________________________________________
>> New York PHP Users Group Community Talk Mailing List
>> http://lists.nyphp.org/mailman/listinfo/talk
>>
>> http://www.nyphp.org/Show-Participation
>>
> _______________________________________________
> New York PHP Users Group Community Talk Mailing List
> http://lists.nyphp.org/mailman/listinfo/talk
>
> http://www.nyphp.org/Show-Participation
>


More information about the talk mailing list