[nycphp-talk] Parsing Fun
inforequest
sm11szw02 at sneakemail.com
Mon Aug 23 01:11:53 EDT 2004
Christopher Greeley tgrza-at-grza.com |nyphp 04/2004| wrote:
> I have been experimenting with parsing, as I am, and have always been
> (regardless of the programming language) in the dark on exactly how I
> should be going about parsing a text file. I have always kept it
> simple with easy explodes and the like, but it is getting to the point
> where I want to have a smarter script that doesn’t need a finite list
> of things that must come in a certain order, etc. So, to that end, I
> have been experimenting with parsing some RSS streams (I am using the
> Reuters Sports Stream at
> http://www.microsite.reuters.com/rss/sportsNews as a guinea pig). I
> thought that for this end, sscanf would be really easy – I basically
> got the position of two tags I wanted to read in between with strpos,
> used substr to truncate the string, and then attempted to use sscanf
> to parse it into neat little variables. The problem I ran into is that
> sscanf doesn’t really like white spaces, and it stops reading at that
> point. So, I dug around a little and found that someone had used
> %[^[]] to match everything – but at this point, sscanf stopped
> following my handy little outline.
>
> So, this is more of a request for some general direction in gaining
> some parsing skills – I am sure there are some out there with some
> weaker skills who could use the brush up as well.
>
> Thanks,
>
> Chris
>
>------------------------------------------------------------------------
>
I always enjoy parsing (really). IMHO string manipulation is what made
Professional Basic the success it was (PBDS, a loong time ago) and what
sold me on PHP as a general-purpose scripting language and not just a
way to access HTTP headers.
Tokenizing is always fun (strtok). file_get_contents is very handy for
use with PHP string functions, especially tokenizing. I never cared much
for "exploding" or "splitting". Nothing like a deep, nested loops of
str_stuff and preg_replace to get the brain cells working in the morning
or in other words, to burn up your morning hours!)
Seriously though, with PHP I have found it is wise to try and use the
built in parsers when possible, such as parse_url, but *never
underestimate the amazing power of the preg_replace_callback*.
In your example, wouldn't preg_match(all) be a better choice than
sscanf, so you can explicitly handle tabs and whitespace? That'd give
you an array of all tagged content appearing in your file (in case there
was more than the one you expected ;-) which you can further
parse-n-store using str_"stuff" as you like?
-=john andrews
More information about the talk
mailing list