[nycphp-talk] (ir) regular expressions (stupid me)
Dan Cech
dcech at phpwerx.net
Sat Apr 24 10:49:22 EDT 2004
Jayesh Sheth wrote:
> So what I am saying, is that I need to check for two commas, an alpha
> numeric string before the first comma, a capitalized city name after the
> first comma, and two capital letters after the second comma (for the
> state). I will ignore whitespace before or after the commas. (That
> whitespace can be trimmed programmatically).
You could make your life easier by trimming the whitespace before you
start to validate the address.
> I find POSIX style expressions using the ereg() function to be a bit
> easier to learn than their Perl equivalents. Here is what I came up with
> using an ereg() expression:
Learning preg_* syntax is well worth the trouble because it is an order
of magnitude more efficient than ereg.
> ^([[:alnum:]]+[\.]{0,}[[:space:]]{0,}){1,},([[:space:]]{0,}[[:upper:]][[:alpha:]]+),([[:space:]]{0,}[[:upper:]]{2})$
>
$address = '123 Elm St., Brooklyn, NY';
// remove any whitespace around commas
$address = preg_replace ('/\s*,\s*/',',',$address);
// check address
if (preg_match ('/^([0-9]+ [\w\s]+)[.]?,([A-Z][a-z]+),([A-Z]{2})$/',
$address, $matches)) {
$street = $matches[1];
$city = $matches[2];
$state = $matches[3];
$address = $street . ', ' . $city . ', ' . $state;
}
That preg expression is similar to the one you were already using in
intent, and should be extended to deal with city names like 'Salt Lake
City' etc.
One way would be:
'/^([0-9]+ [\w\s]+)[.]?,([A-Za-z\s]+),([A-Z]{2})$/'
You may in fact want to go with a 3-input solution and grab the street,
city and state separately, unless you can be guaranteed the user will
put in the commas.
> B) In the second case, what I want to check for seems to be much
> simpler, but I having no luck.
You can change the above preg expression by removing the [.]? which will
disallow the use of the . character in the address, however as you can
see in the above expression, the . character is outside the brackets for
the address, so it is discarded if it exists anyway.
Otherwise, you would have to do a negative check for 'St.', ie look for
St., and if not found then look for St without the full stop.
Hope this helps in some way,
Dan
More information about the talk
mailing list