Going from iso-8859-1 to utf-8

I’d like to start using utf-8 for character encoding. This whole site is currently in iso-8859-1. Each time I try using utf-8, I run into problems with characters not showing up, or showing up as a garbled mess.

I have tried RTFM, Google, MT support forums, screaming, banging my head against the wall, and taking a walk. None of that helped. I’m hoping someone else has gone from iso-8859-1 to utf-8 on a Movable Type setup before me, and can suggest the best way of converting it to utf-8

Here’s my setup, assuming that’s important:

  • Movable Type 2.65
  • MySQL
  • All previous entries are iso-8859-1
  • Most posts are in English, but some are in Swedish and contain accented characters. Well, English does too, so that shouldn’t matter. But I’m mentioning it because the posts that get messed up are the ones that are in Swedish.
  • Documents are XHTML 1.0 Strict, and served as application/xhtml+xml to browsers that support it. They get converted to HTML 4.01 Strict and are served as text/html to others.

This is what I did last time I tried this:

  • Exported my entries from MT
  • Opened the export file, saved as utf-8.
  • Opened power-editing mode, deleted all entries (which took forever)
  • Reimported everything
  • Converted all templates to utf-8
  • Changed the content negotiation script to send utf-8 in the Content-Type header
  • Opened mt.cfg, changed PublishCharset Shift_JIS to PublishCharset utf-8, and uncommented NoHTMLEntities 1
  • Rebuilt everything
  • Screamed loudly

What am I doing wrong?

Comments

1. September 4, 2004 by x113

just try to use iconv before display page.

2. September 4, 2004 by caffènero

I have similar problems, Roger. In my specific case, the dumb IE refuses to display UTF-8 "special" chars (arrows and such), while FF and all the other browsers correctly display them. With the SAME font (verdana), on the SAME platform (winXP). I know this doesn't solve your issue, but may warn you on future ones. :(

Seems UTF-8 isn't well-supported yet, no matter how you force apache, editors and browsers to use the right ancoding...

3. September 4, 2004 by Devon

Open up mt.cfg, and there's a line for setting the default character set for all your blogs. Mine currently looks like:

PublishCharset UTF-8

I assume MT uses that to output entries in that character set. I've never checked it, but I haven't found any validation problems related to characters either.

Here's a suggestion (if that doesn't work for your past entries). Export all entries, then do a search-n-replace for the several problematic Swedish characters to change them into their numeric entity equivalents. Then import the entries. If you do that, and then have the PublishCharset value as UTF-8, you should have no problems with future entries and the past ones will work without you having to go through and edit them one by one.

4. September 5, 2004 by Julik

You have to make a dump of your database and convert it to UTF-8 with a capable text editor, then push it back.

"Straight-out" conversion is only possible when you have plain ASCII text (not your case).

5. September 5, 2004 by Roger Johansson

Devon: I tried changing that line in mt.cfg. Didn't help. Doing a search and replace sounds like it's worth a try.

Julik: That's what I did, if you mean exporting the database from within MT.

6. September 5, 2004 by Andrew Green

I'm not familiar enough with Movable Type to be sure this is relevant, but it may also be worth making sure that your installation of MySQL is version 4. Until version 4, MySQL tables couldn't store UTF8 data properly.

7. September 6, 2004 by Björn

Why do you want to change encoding? There must be a good reason but I can't see it (not to good at the encoding-stuff though:).

8. September 6, 2004 by Roger Johansson

Andrew: Thanks for the tip. The server is running MySQL 4.something, so it should be able to handle Unicode.

Björn: Partly because it's what I "should" use, partly because I want to learn from it.

9. September 7, 2004 by Ben

Roger,

I'm not sure of the validity of this article (I haven't tried it myself), but I found an entry on someone's blog stating you need to change the send_http_header routine in App.pm. It also mentions something about Apache configuration, although I'm assuming you've already covered that one off.

Let us how you get on, I'm thinking about going utf-8 on MT when I next upgrade/redesign...

10. September 8, 2004 by Roger Johansson

I still haven't revisited the utf-8 problem. It looks like upgrading to MT 3 could help. I'll look into that, as well as take a closer look at other systems, when I have time.

11. November 16, 2004 by Chvora

I am trying to email uft8 and it gives me a headache.

I belive I had the same problem displaying utf8 data on the website. FF will display it properly but IE won't AFAIK I the html template had hardcoded the charset, you might have to change it to utf8. FF for some reason correctly auto switches to utf while IE does what its told and displays it as iso.

12. December 7, 2004 by EmilV

Have you checked the settings for your server? Apache 2.0 has a default override to iso-8859-1

13. December 28, 2004 by padawan

FWIW, I've documented how I converted my MT blog from ISO-8859-1 to UTF-8.

http://padawan.info/weblog/convertingamovabletypeblogfromiso88591toutf8.html

The trick is to make sure that ALL the elements of the chain, from the content to the web server all inclusive are in UTF-8. It's not a matter of just changing one setting in MT or Apache or MySQL, it must be consistent all the way through, starting with the content.

14. December 28, 2004 by padawan

Oops, Markdown messes with your autolink feature. the link above is:

Converting a Movable Type blog from ISO-8859-1 to UTF-8

15. December 29, 2004 by Roger Johansson

padawan: Thanks, you seem to have run into much the same problems I did. My problems were probably caused by not converting the database plus using an ftp application that somehow screwed up character encoding during transfer. I'll keep your article handy next time I give this a try.

Sorry, comments are closed for this post.

Information, sponsorship, and externals

About the author

Roger Johansson is a Swedish web professional specialising in web standards, accessibility, and usability. More about me and this site.

Subscribe

Looking for web hosting?

Try DreamHost!

Use the promo code 456BEREASTREET3 to save USD 20 when you sign up!

Latest articles

Validation statistics from Nikita the Spider Comments off
An analysis of the sites crawled by the bulk validation tool Nikita the Spider during March 2008.
Authentic Jobs API and Affiliates program Comments off
The Authentic Jobs job listing service now has a public API and an affiliate program.
What does Acid3 mean to you and me? Comments off
Opera and Apple have announced that their web browsers pass the Acid3 Browser Test, but how will that help web designers and developers?
Designing Web Navigation (Book review) Comments off
Learn the fundamentals of navigation design and design better navigation systems for large and small sites as well as for web based applications.
DOMAssistant bundle for TextMate Comments off
To save keystrokes and speed up development I have created a DOMAssistant bundle for TextMate.
First impressions of Internet Explorer 8 Beta 1 Comments off
My impressions after trying out Internet Explorer 8 Beta 1 for a couple of days.

More articles

Favourites, here and elsewhere

Affiliation

  • NetRelations
  • Kaffesnobben
  • Dagens recept
  • 9rules network member

Support this site

Show your support by buying a book or two from SitePoint or getting me something from my Amazon Wish List.