[nycphp-talk] Determine the text language
David Krings
ramons at gmx.net
Fri Nov 9 06:44:47 EST 2007
hafez ahmad wrote:
> Hi All,
>
> How can I use regular expression to determine the text language, is the
> selected text is English, Arabic, Hebrow, .....etc
>
I wonder if that even could work. Language doesn't follow logic, which is what
you'd test for with reg expressions. I'd see if there is a chance to hook into
the Mozilla or OOo dictionaries. Send the selected text through all the dics
and assume that the one with the least amount of errors is the one that
matches the dic language. That process will take forever and fail when you
have horrible spellers.
Or do you want to check for the different type of character set used? If you
could provide some more detail of what you try to accomplish I guess we could
give you some more hints.
David
More information about the talk
mailing list