Always specify character encoding

As a Mac user and a user of “alternative” web browsers I have visited many websites where text does not display correctly. In almost every case that is caused by the website’s author specifying an incorrect character encoding or not specifying a character encoding at all.

At times when I have tried bringing the problem to the attention of the developers I have been met by a lack of understanding. After all, they couldn’t see a problem in their Windows-only world. Well, that is not the only platform used on the web, so you need to tell the visitor’s browser which character encoding you are using.

David Baron’s Why Web authors must specify character encodings explains this well.

Posted on March 30, 2006 in Quicklinks, Web Standards

Comments

  1. I suppose this applies to feeds as well?

  2. Absolutely. All variants of RSS and Atom are XML languages. See Mark Pilgrim’s Determining the character encoding of a feed.

  3. But not only XML. Everything that contains plain text (so that means HTML, XML, CSS, JavaScript etc.) needs to be accompanied by a character encoding.

    On my website, I described how to use the .htaccess file to automatically send a character encoding. Very useful Apache feature because the character encoding is such a small thing which you can easily forget about. So specifying a default one is probably a smart thing to do.

  4. @Jero:

    Question: Do web servers automatically assign a character encoding to CSS and JS files? Given that character encoding is not part of either specification, character coding must come from somewhere else.

  5. We decided on UTF-8 and we have had a ton of problems with older editors and documents “slipping” out of UTF-8 .. It’s been a huge pain and don’t get me started on the BOM! (byte-order-mark)

    But agreed .. telling the user how they should interpret the characters flying at their browser is a requirement. [sarcasm] Someone should make a list of all these things and create some sort of validation utility for people to use .. [/sarcasm]

  6. If you are lucky enough to have edit privileges on your Apache httpd.conf file, you can add:

    AddDefaultCharset utf-8

    rather than having to rely on .htaccess files.

  7. Or, in the case of Apache 2.2 users, AddDefaultCharset. ;-)

  8. Funny. I just got a “character encoding” bug in my queue this last month.

  9. I totally agree. Even I didn’t pay much attention to character encoding although it was always there when I open new documents in Dreamweaver to work with PHP. Then I started using Hindi for one of my literary blogs and discovered that without proper character encoding the characters were not comprehensible.

  10. On a client website page I specified character encoding to be utf8 with meta http-equiv. But according to the w3.org validator the character encoding as specified by http is iso-8859-1 which results in a mismatch and has the page be invalid xhtml and rendered incorrectly.

    On other pages of the same site, this problem doesn’t occur. Is there another way to overwrite http character encoding in html?

    Thanks!

  11. Hmmm you may have to set the server HTTP header either via the PHP or .htaccess and something on the lines of: AddCharset UTF-8 .html

  12. Recently I took over maintaining an app that had been running for a year or so without any encoding being set. As a result it looks like there is now all kinds of nastyness stored in the database. I’m not looking forward to having to clean it all up to get it running on UTF-8.

  13. I created this character encoding tool to make the special character encoding process a little easier. No need to look up some of theose obscure characters, just copy and paste.

    Entity Encoding Tool

Comments are disabled for this post (read why), but if you have spotted an error or have additional info that you think should be in this post, feel free to contact me.