Going from iso-8859-1 to utf-8
I’d like to start using utf-8 for character encoding. This whole site is currently in iso-8859-1. Each time I try using utf-8, I run into problems with characters not showing up, or showing up as a garbled mess.
I have tried RTFM, Google, MT support forums, screaming, banging my head against the wall, and taking a walk. None of that helped. I’m hoping someone else has gone from iso-8859-1 to utf-8 on a Movable Type setup before me, and can suggest the best way of converting it to utf-8…
Here’s my setup, assuming that’s important:
- Movable Type 2.65
- MySQL
- All previous entries are
iso-8859-1 - Most posts are in English, but some are in Swedish and contain accented characters. Well, English does too, so that shouldn’t matter. But I’m mentioning it because the posts that get messed up are the ones that are in Swedish.
- Documents are XHTML 1.0 Strict, and served as
application/xhtml+xmlto browsers that support it. They get converted to HTML 4.01 Strict and are served astext/htmlto others.
This is what I did last time I tried this:
- Exported my entries from MT
- Opened the export file, saved as
utf-8. - Opened power-editing mode, deleted all entries (which took forever)
- Reimported everything
- Converted all templates to
utf-8 - Changed the content negotiation script to send
utf-8in the Content-Type header - Opened
mt.cfg, changedPublishCharset Shift_JIStoPublishCharset utf-8, and uncommentedNoHTMLEntities 1 - Rebuilt everything
- Screamed loudly
What am I doing wrong?
- Previous post: Bullet madness
- Next post: Full height sliding columns
Subscribe / follow
Sponsors
Authentic Jobs
- (re)define our guts: FarmersWeb seeking Lead Web Engineer at FarmersWeb, LLC (New York, NY, Ne, US)
- PHP Developer at XYZ.com (Beverly Hills, CA, Ca, US)
- Creative Director at LoudDoor (Columbia, SC, So, US)
- Graphic Designer at LoudDoor (Columbia, SC, So, US)
DreamHost web hosting
Use the promo code 456BEREASTREET3 to save USD 20 when you sign up for DreamHost


Comments
just try to use iconv before display page.
I have similar problems, Roger. In my specific case, the dumb IE refuses to display UTF-8 “special” chars (arrows and such), while FF and all the other browsers correctly display them. With the SAME font (verdana), on the SAME platform (winXP). I know this doesn’t solve your issue, but may warn you on future ones. :(
Seems UTF-8 isn’t well-supported yet, no matter how you force apache, editors and browsers to use the right ancoding…
Open up mt.cfg, and there’s a line for setting the default character set for all your blogs. Mine currently looks like:
PublishCharset UTF-8
I assume MT uses that to output entries in that character set. I’ve never checked it, but I haven’t found any validation problems related to characters either.
Here’s a suggestion (if that doesn’t work for your past entries). Export all entries, then do a search-n-replace for the several problematic Swedish characters to change them into their numeric entity equivalents. Then import the entries. If you do that, and then have the PublishCharset value as UTF-8, you should have no problems with future entries and the past ones will work without you having to go through and edit them one by one.
You have to make a dump of your database and convert it to UTF-8 with a capable text editor, then push it back.
“Straight-out” conversion is only possible when you have plain ASCII text (not your case).
Devon: I tried changing that line in mt.cfg. Didn’t help. Doing a search and replace sounds like it’s worth a try.
Julik: That’s what I did, if you mean exporting the database from within MT.
I’m not familiar enough with Movable Type to be sure this is relevant, but it may also be worth making sure that your installation of MySQL is version 4. Until version 4, MySQL tables couldn’t store UTF8 data properly.
Why do you want to change encoding? There must be a good reason but I can’t see it (not to good at the encoding-stuff though:).
Andrew: Thanks for the tip. The server is running MySQL 4.something, so it should be able to handle Unicode.
Björn: Partly because it’s what I “should” use, partly because I want to learn from it.
Roger,
I’m not sure of the validity of this article (I haven’t tried it myself), but I found an entry on someone’s blog stating you need to change the send_http_header routine in App.pm. It also mentions something about Apache configuration, although I’m assuming you’ve already covered that one off.
Let us how you get on, I’m thinking about going utf-8 on MT when I next upgrade/redesign…
I still haven’t revisited the utf-8 problem. It looks like upgrading to MT 3 could help. I’ll look into that, as well as take a closer look at other systems, when I have time.
I am trying to email uft8 and it gives me a headache.
I belive I had the same problem displaying utf8 data on the website. FF will display it properly but IE won’t AFAIK I the html template had hardcoded the charset, you might have to change it to utf8. FF for some reason correctly auto switches to utf while IE does what its told and displays it as iso.
Have you checked the settings for your server? Apache 2.0 has a default override to iso-8859-1
FWIW, I’ve documented how I converted my MT blog from ISO-8859-1 to UTF-8.
http://padawan.info/weblog/convertingamovabletypeblogfromiso88591toutf8.html
The trick is to make sure that ALL the elements of the chain, from the content to the web server all inclusive are in UTF-8. It’s not a matter of just changing one setting in MT or Apache or MySQL, it must be consistent all the way through, starting with the content.
Oops, Markdown messes with your autolink feature. the link above is:
Converting a Movable Type blog from ISO-8859-1 to UTF-8
padawan: Thanks, you seem to have run into much the same problems I did. My problems were probably caused by not converting the database plus using an ftp application that somehow screwed up character encoding during transfer. I’ll keep your article handy next time I give this a try.
Comments are disabled for this post (read why), but if you have spotted an error or have additional info that you think should be in this post, feel free to contact me.