NYCPHP Meetup

NYPHP.org

[nycphp-talk] htmlentities charset bug

John Campbell jcampbell1 at gmail.com
Wed Jan 23 13:43:44 EST 2008


On Jan 23, 2008 1:16 PM, Cliff Hirsch <cliff at pinestream.com> wrote:
> On 1/23/08 12:58 PM, "Michael B Allen" <ioplex at gmail.com> wrote:>>  Reason:
> >>Invalid multibyte sequence in argument
> >> Those curly single and double quotes are killers.
> >
> > The problem isn't htmlentities, it's the charset you're pages are
> > emitted in. If you emit an HTML form in ISO-8859-1 and then submit
> > garbage data, the database may store it as garbage and now you have a
> > simple garbage-in / garbage-out scenario. Feed that to htmlentites and
> > tell it it's ISO-8859-1 and you'll get an "Invalid multibyte sequence"
> > error.

> > if the browser was really sophisticated about it
> > it could pop-up a dialog that warns you and asks you if you would like
> > to transliterate those characters to ISO-8859-1 equivalent glyphs.
> I wonder if there is any way to detect this on the server side. Htmlentities
> certainly catches the problem, but returns an empty string. Some sort of
> friendlier filter that strips characters that are the wrong charset would be
> very cool.

The clipboard on any modern OS automatically takes care of this.
Encode your pages in utf-8 and you will never have a problem in the
first place.  If you still want to check the encoding, use
mb_check_encoding()

Also, why are you using htmlhentities?  It is a useless function.  If
you want to escape html, the correct function is htmlspecialchars.
Htmlentities should never be used... it is slower, adds no security
benefit, and it unnecessarily makes the data unreadable.

> > I always use UTF-8.
> I think I will too! Seems to be the way to go.

+1 for utf-8

It is not really optional for modern web development because XMLHTTP
auto converts everything to utf8 no matter what encoding you use on
your page.  If you are not using UTF-8, any data that is not 7-bit
ascii will get screwed up if it is submitted using ajax.

Regards,
John Campbell



More information about the talk mailing list