NYCPHP Meetup

NYPHP.org

[nycphp-talk] OT: Standard Data Import Tool?

Matt Juszczak matt at atopia.net
Thu Dec 31 12:55:21 EST 2009


Hi Adrian,

Thanks for your reply.  I'm more looking for software that will help with 
the flow, not necessarilly information on how to properly design software 
from scratch.  But looks like we may do this from scratch in the end.

Thanks!

-Matt


On Tue, 29 Dec 2009, Adrian Noland wrote:

> If I understand your problem, then this may help:
> http://www.devshed.com/c/a/PHP/Introducing-the-Strategy-Pattern/
> http://www.fluffycat.com/PHP-Design-Patterns/Strategy/
> 
> 
> I would use it something like this:
> $this->data = new DataStream($source);
> $bool = new BasicValidate($this->data);
> $bool = new DataMapper($this->data);
> $bool = new LayoutBuilder($this->data);
> 
> if($bool) {
>     $result = MysqlQuery($data);
> }
> 
> Adrian
> 
> On Tue, Dec 29, 2009 at 11:46 AM, Matt Juszczak <matt at atopia.net> wrote:
>       Hi Greg,
>
>       That's pretty much how we had things mapped out.  Basic source validation in the beginning, then a more
>       standard validation later on.  I guess my question was - is there any open source project that would help
>       with this work flow, such as handling exceptions in the case of bad data, etc. etc.?
>
>       -Matt
> 
> 
> On Tue, 29 Dec 2009, Greg Rundlett (freephile) wrote:
>
>       There are several issues all bundled up here.
>
>       First you have to define the file format(s) that your feeds come in so
>       that you can read them and isolate the data elements correctly.  Are
>       they CSV, TSV, XML, custom?
>       At the first (read) step you can do gross validation like validate the
>       source for correct character encoding, record separators, record
>       length etc.
>       You also want to define "layouts" for various feeds.  By defining the
>       layout, you can apply a set of processing instructions to transform
>       the received layout into the desired layout.
>       Once you have the data in the desired layout, you can then do finer
>       validation such as per field rules.  E.g. last name must
>       /(a-zA-Z'-){2,100}/
>       Then you can write an import routine that determines the correct
>       layout transformation to apply, reads the source, tranposes it,
>       validates it, stores it in the database, logs the event
>
>       hth,
>
>       Greg Rundlett
>
>       nbpt 978-225-8302
>       m. 978-764-4424
>       -skype/aim/irc/twitter freephile
>       http://profiles.aim.com/freephile
> 
> 
> 
>
>       On Tue, Dec 29, 2009 at 12:28 PM, Matt Juszczak <matt at atopia.net> wrote:
>             Hi all,
>
>             I need to write a tool that somehow takes multiple data feeds (some xls
>             files, some csv files, etc.), defines each format,
>             verifies/validates/sanitizes the data, and imports it into standardized
>             MySQL tables.
>
>             As an example, one feed might look like this:
>
>             First Name, Last Name, DOB
>
>             And another like this:
>
>             Name, DOB
>
>             And another:
>
>             DOB, Last Name, First Name, Middle Initial
>
>             I need to somehow be able to say "For this source, this column needs to be
>             split into two, and this other column is the date of birth column".  Once
>             the data is parsed, it would need to be imported into standard MySQL tables:
>
>             first_name, last_name, birth_date, etc.
>
>             This is a simple example - the files and tables actually have 20-30 columns
>             each, so using mysqlimport or a simple shell script probably isn't the best
>             option.
>
>             Are there open source tools out there that do this already?  If not, should
>             I just build something?
>
>             Thanks,
>
>             -Matt
>             _______________________________________________
>             New York PHP Users Group Community Talk Mailing List
>             http://lists.nyphp.org/mailman/listinfo/talk
>
>             http://www.nyphp.org/Show-Participation
>
>       _______________________________________________
>       New York PHP Users Group Community Talk Mailing List
>       http://lists.nyphp.org/mailman/listinfo/talk
>
>       http://www.nyphp.org/Show-Participation
> 
> 
> _______________________________________________
> New York PHP Users Group Community Talk Mailing List
> http://lists.nyphp.org/mailman/listinfo/talk
> 
> http://www.nyphp.org/Show-Participation
> 
> 
> 
>


More information about the talk mailing list