Saturday, April 4, 2009

Overcoming AJaX UTF-8 Encoding Limitation (in PHP)

I've been learning a lot about PHP, MySQL programming through a project I've been working on for 2 months.

I didn't know AJaX, actually. And when I find any challenge that takes time and a lot of headaches to overcome, I share the solution for the sake of saving people some time and bad moments.

My web application is in Spanish (latin american characters), and I use AJaX to show search results paginated, in order to make it simpler and faster to go through the results, sort them, etc...

And the results include words in Spanish, containing characters not supported in UTF-8, AJaX won't show those characters correctly.

And no, there's no way, so far I've researched, to change the UTF-8 character set for the XMLHttpRequest object; setRequestHeader("Content-Type", "...; charset=ISO...") won't work.

Some pages say you may encode the URI you are sending, and decode the responseText and you're all set. I assume encoding the URI applies when using GET method, or when using non-UTF-8 characters in the parameter values, but not to make the responseText UTF-8 compliant. I can't decode a responseText that was never encoded. And I wasn't able to decode the responseText when assigning it to the containing element.

So I finally found the solution on a web page, much simpler than what I would imagine: encode the string representing the HTML string at the server-side to UTF-8 before returning it.

In PHP, it's like this:

$html = "... all my html response text here ...";

or whatever way you decide to do it, the fact is, encode the response text with ut8_encode() before printing it or returning it.

Also, very important, make sure the HTML page receiving the responseText has the appropriate character encoding.

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

I assume it'll work pretty much the same way on other programming languages; simply encode the response text server side before responding, and make sure the receiving page character set is utf-8.

Hope it is helpful for anyone.



Luckylooke said...
This comment has been removed by the author.
Luckylooke said...

thank you very much for solving the same problem with Slovak language ;) Ajax response text was always in UTF8 and my pages are in windows1250 so after reading your blog I tried to DECODE UTF8 before sending the response back to the page and it works! :) I wish I found your blog earlier ;) (I was solving this problem for couple of hours)

Dimitar said...

I never thought that the problem was php.
Thank you so much, it really helped.

Anonymous said...

Thank you so much ... A very simple solution, to a problem of which I got a serious headache.


Dikki said...

Thank you! This is a huge help that unfortunately only comes after hours of searching. I wish I found this earlier.

Juan Manuel Trejo Sánchez said...

Hi all! I'm glad to know it worked for you as well!

I feel we as part of a community have the responsibility to share our lessons learned: It gave me a lot of headaches as well, glad to know I saved a few! :)

apunk said...

just thought I'd let you know that you saved me a lot of wasted time... I've been looking for this solution for hours!

Dave Espionage said...

Thank you! Once I had isolated my issue with an edit-in-place module delivering literal UTF-8, I found your post, and it completely fixed the issue I was having.

This should be in a basic AJAX/PHP rulebook somewhere!

Alex said...

Great post, saved me a huge headache. Only wish I had found your post earlier.

Sam Scott said...

Thank you, the utf8 encoding has worked for some of my characters, but not for others. For example \u0127 should be ħ but comes back as regular h. If you have any follow-up advice, I would appreciate it.