My picThis has been bugging me the few last months that i got a job and i have to develop greek sites. This encoding thing is a mess! You never gonna know what will happen the next time you develop a new site. One has to be very carefull about the database’s encoding and the one you use for your final output. If you had problems with that or this sounds like your problem too keep reading. If you are to develop a non English site then i would definitely suggest that you get the following pointers under consideration… Well here comes a very bad scenario… You have a database that has, let’s say, a greek encoding. So your database is greek encoded with greek_general_ci as a collation. Now here comes the worse part. Your final output is in UTF8. Now it’s time to scream “Houston… We have a problem!!”. Yup. You’re sc****d 🙂 Been there before. What you need to do is this:

  • Before querying the database set the correct encoding (for greek) :
    • SET NAMES greek;
  • After getting the content you should convert the incoming encoding of the text to render from the local one (for greek ISO-8859-7) to UTF-8. To do this you can use either the iconv function or the mb_convert_encoding. For the second one you need to have the mbstring module enabled in php (usefull info can be found here). An example follows for both of them.
    • iconv(‘ISO-8859-7’, ‘UTF-8’, $mystring);
    • mb_convert_encoding($mystring, ‘UTF-8’, ‘ISO-8859-7’);
  • After those two things you should be able to see your content into your UTF-8 page.

Although all the above do work, they do sound a bit weird. One would ask: “if i have a site with many users imagine doing this for every string on every request!”. Well it sounds about right. The most prefered way and what i would suggest is: “hey! utf’em all fellas!”. I mean ok, i know. A utf string is much bigger than the one on the local encoding but you will never have problems. If somebody comes to you and says “i want to open my fora to the chineese market” the only thing you will say is “hire a translator” 😉

But here comes a much worse scenario, which has happened to me several times the past few months. If you want to transfer content from one site (that has a local encoding i.e. ISO-8859-7) to a one with UTF-8?? Well this is trickier, not because it’s hard but just because it has more variables to think about. Having two databases source (local one for example greek_general_ci) and dest (UTF-8) step by step way would be:

  • When getting the data from the sourc, before querying just like before:
    • SET NAMES greek;
  • The, before inserting into the new database convert the encoding with either iconv or mb_convert_encoding.
  • Last but not least. Before inserting into the new one be sure you query:
    • SET NAMES utf8;

Do not forget to use the last query. Well, it sounds pretty clear but somewhere on the way it gets really messy. Somehow you either forget something or something gets in the way. For example, mysql_real_escape_string() converts some of the greek’s local characters to UTF-8 :S Anyway i don’t want to go on nagging. It’s been a long day. I hope my small summarize helps you guys out make your way through the encodings. If you have anything to add, go on and post a comment.

/me out