[nycphp-talk] UTF-8, databases and best practices
Hans Zaunere
bulk at zaunere.com
Wed May 23 13:25:28 EDT 2012
Hi Eugenio,
> I need to distribute an application that potentially can be used with
> many different DBMSs (such as MySQL, PostgreSQL, SQLite, Microsoft SQL
> Server). The charset used in the databases can be ANY.
>
> I would like to always output UTF-8 text when possible and my
> questions are about the current best practices to handle this kind of
> application with PHP.
>
> 1) As far as I know, PHP still doesn't support natively utf-8 so to
> avoid problems with string functions, I still have to use mbstring
> fucntions, am I right? What does PHP 5.4 change about that?
AFAIK, correct, and there hasn't been many significant changes with this
recently.
> 2) How to handle the fact that the data I receive from the database
> can be stored using any possible charset? Do I need iconv functions
> and convert everything in utf-8? And then convert it back in the
> original charset when I have to write to the DB?
I'd be interested to hear other's thoughts, but the general consensus these
days is "convert all to UTF-8". Is there an application-requirement-reason
that you'd need to convert data to a different charset at different times?
In general:
1. Raw data (any charset/encoding)
2. Detect and convert to UTF 8, clean-up, etc.
3. Store in database/etc
4. Read/display in UTF 8
This should support the vast majority of written human languages, though I
believe there are some exceptions.
H
More information about the talk
mailing list