NYCPHP Meetup

NYPHP.org

[nycphp-talk] OT: Standard Data Import Tool?

Adrian Noland anoland at indigente.net
Tue Dec 29 17:49:08 EST 2009


If I understand your problem, then this may help:
http://www.devshed.com/c/a/PHP/Introducing-the-Strategy-Pattern/
http://www.fluffycat.com/PHP-Design-Patterns/Strategy/


I would use it something like this:
$this->data = new DataStream($source);
$bool = new BasicValidate($this->data);
$bool = new DataMapper($this->data);
$bool = new LayoutBuilder($this->data);

if($bool) {
    $result = MysqlQuery($data);
}

Adrian

On Tue, Dec 29, 2009 at 11:46 AM, Matt Juszczak <matt at atopia.net> wrote:

> Hi Greg,
>
> That's pretty much how we had things mapped out.  Basic source validation
> in the beginning, then a more standard validation later on.  I guess my
> question was - is there any open source project that would help with this
> work flow, such as handling exceptions in the case of bad data, etc. etc.?
>
> -Matt
>
>
> On Tue, 29 Dec 2009, Greg Rundlett (freephile) wrote:
>
>  There are several issues all bundled up here.
>>
>> First you have to define the file format(s) that your feeds come in so
>> that you can read them and isolate the data elements correctly.  Are
>> they CSV, TSV, XML, custom?
>> At the first (read) step you can do gross validation like validate the
>> source for correct character encoding, record separators, record
>> length etc.
>> You also want to define "layouts" for various feeds.  By defining the
>> layout, you can apply a set of processing instructions to transform
>> the received layout into the desired layout.
>> Once you have the data in the desired layout, you can then do finer
>> validation such as per field rules.  E.g. last name must
>> /(a-zA-Z'-){2,100}/
>> Then you can write an import routine that determines the correct
>> layout transformation to apply, reads the source, tranposes it,
>> validates it, stores it in the database, logs the event
>>
>> hth,
>>
>> Greg Rundlett
>>
>> nbpt 978-225-8302
>> m. 978-764-4424
>> -skype/aim/irc/twitter freephile
>> http://profiles.aim.com/freephile
>>
>>
>>
>>
>> On Tue, Dec 29, 2009 at 12:28 PM, Matt Juszczak <matt at atopia.net> wrote:
>>
>>> Hi all,
>>>
>>> I need to write a tool that somehow takes multiple data feeds (some xls
>>> files, some csv files, etc.), defines each format,
>>> verifies/validates/sanitizes the data, and imports it into standardized
>>> MySQL tables.
>>>
>>> As an example, one feed might look like this:
>>>
>>> First Name, Last Name, DOB
>>>
>>> And another like this:
>>>
>>> Name, DOB
>>>
>>> And another:
>>>
>>> DOB, Last Name, First Name, Middle Initial
>>>
>>> I need to somehow be able to say "For this source, this column needs to
>>> be
>>> split into two, and this other column is the date of birth column".  Once
>>> the data is parsed, it would need to be imported into standard MySQL
>>> tables:
>>>
>>> first_name, last_name, birth_date, etc.
>>>
>>> This is a simple example - the files and tables actually have 20-30
>>> columns
>>> each, so using mysqlimport or a simple shell script probably isn't the
>>> best
>>> option.
>>>
>>> Are there open source tools out there that do this already?  If not,
>>> should
>>> I just build something?
>>>
>>> Thanks,
>>>
>>> -Matt
>>> _______________________________________________
>>> New York PHP Users Group Community Talk Mailing List
>>> http://lists.nyphp.org/mailman/listinfo/talk
>>>
>>> http://www.nyphp.org/Show-Participation
>>>
>>>  _______________________________________________
>> New York PHP Users Group Community Talk Mailing List
>> http://lists.nyphp.org/mailman/listinfo/talk
>>
>> http://www.nyphp.org/Show-Participation
>>
>
> _______________________________________________
> New York PHP Users Group Community Talk Mailing List
> http://lists.nyphp.org/mailman/listinfo/talk
>
> http://www.nyphp.org/Show-Participation
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nyphp.org/pipermail/talk/attachments/20091229/e8797c48/attachment.html>


More information about the talk mailing list