[nycphp-talk] character set filtering
Allen Shaw
ashaw at iifwp.org
Tue Aug 16 11:53:51 EDT 2005
Hi All,
I've googled around a little but probably am not using the right key
words, so I ask for a few suggestions:
Our online database system is meant sooner or later to allow several
thousand of our contacts to start updating their own data records (with
careful data screening on our side of course). The big sticking point
for me is that we can't have them submitting it in just whatever
character set they want. For example, we don't want to let a Japanese
user send in his name in Chinese characters, or any kind of kana either;
the Koreans shouldn't be allowed to submit Hangul, etc., etc. So
somewhere in the system I have to screen user input to be sure it's
limited to a certain character set.
Questions I'm struggling with along this line are these:
* What character set shall we use? (For example, of course we don't
allow Chinese, Thai, Arabic, etc., but what about umlauts and the
occassional enye?) That's an internal decision for us, I'm sure, but do
you know of technical points I should be sure to consider?
* How will I screen the incoming data? Do I just hack some regex
together and run everything through it, or is there a library I should
consider, etc.?
* How totally without clue am I about this whole topic?
If you have specific examples of sites that are doing a good job with
this, or links to more I could read on the topic, that would be great,
but I'd love to hear any suggestions or experience you can share.
Thanks,
Allen
--
Allen Shaw
Polymer (http://polymerdb.org)
Fine-grained control over how your users access your data:
user permissions, reports, forms, ad-hoc queries -- all
centrally managed.
More information about the talk
mailing list