Always specify character encoding
As a Mac user and a user of “alternative” web browsers I have visited many websites where text does not display correctly. In almost every case that is caused by the website’s author specifying an incorrect character encoding or not specifying a character encoding at all.
At times when I have tried bringing the problem to the attention of the developers I have been met by a lack of understanding. After all, they couldn’t see a problem in their Windows-only world. Well, that is not the only platform used on the web, so you need to tell the visitor’s browser which character encoding you are using.
David Baron’s Why Web authors must specify character encodings explains this well.
- Previous post: See your markup in context with X-Ray
- Next post: Seeking Experienced Front End Web Developers, Stockholm
Information, sponsorship, and externals
About the author
Roger Johansson is a Swedish web professional specialising in web standards, accessibility, and usability. More about me and this site.
Latest articles
- Validation statistics from Nikita the Spider Comments off
- An analysis of the sites crawled by the bulk validation tool Nikita the Spider during March 2008.
- Authentic Jobs API and Affiliates program Comments off
- The Authentic Jobs job listing service now has a public API and an affiliate program.
- What does Acid3 mean to you and me? Comments off
- Opera and Apple have announced that their web browsers pass the Acid3 Browser Test, but how will that help web designers and developers?
- Designing Web Navigation (Book review) Comments off
- Learn the fundamentals of navigation design and design better navigation systems for large and small sites as well as for web based applications.
- DOMAssistant bundle for TextMate Comments off
- To save keystrokes and speed up development I have created a DOMAssistant bundle for TextMate.
- First impressions of Internet Explorer 8 Beta 1 Comments off
- My impressions after trying out Internet Explorer 8 Beta 1 for a couple of days.









Comments
I suppose this applies to feeds as well?
Absolutely. All variants of RSS and Atom are XML languages. See Mark Pilgrim's Determining the character encoding of a feed.
But not only XML. Everything that contains plain text (so that means HTML, XML, CSS, JavaScript etc.) needs to be accompanied by a character encoding.
On my website, I described how to use the .htaccess file to automatically send a character encoding. Very useful Apache feature because the character encoding is such a small thing which you can easily forget about. So specifying a default one is probably a smart thing to do.
@Jero:
Question: Do web servers automatically assign a character encoding to CSS and JS files? Given that character encoding is not part of either specification, character coding must come from somewhere else.
We decided on UTF-8 and we have had a ton of problems with older editors and documents "slipping" out of UTF-8 .. It’s been a huge pain and don't get me started on the BOM! (byte-order-mark)
But agreed .. telling the user how they should interpret the characters flying at their browser is a requirement. [sarcasm] Someone should make a list of all these things and create some sort of validation utility for people to use .. [/sarcasm]
If you are lucky enough to have edit privileges on your Apache httpd.conf file, you can add:
AddDefaultCharset utf-8
rather than having to rely on .htaccess files.
Or, in the case of Apache 2.2 users, AddDefaultCharset. ;-)
Funny. I just got a "character encoding" bug in my queue this last month.
I totally agree. Even I didn't pay much attention to character encoding although it was always there when I open new documents in Dreamweaver to work with PHP. Then I started using Hindi for one of my literary blogs and discovered that without proper character encoding the characters were not comprehensible.
On a client website page I specified character encoding to be utf8 with meta http-equiv. But according to the w3.org validator the character encoding as specified by http is iso-8859-1 which results in a mismatch and has the page be invalid xhtml and rendered incorrectly.
On other pages of the same site, this problem doesn't occur. Is there another way to overwrite http character encoding in html?
Thanks!
Hmmm you may have to set the server HTTP header either via the PHP or .htaccess and something on the lines of: AddCharset UTF-8 .html
Recently I took over maintaining an app that had been running for a year or so without any encoding being set. As a result it looks like there is now all kinds of nastyness stored in the database. I'm not looking forward to having to clean it all up to get it running on UTF-8.
I created this character encoding tool to make the special character encoding process a little easier. No need to look up some of theose obscure characters, just copy and paste.
Entity Encoding Tool
Sorry, comments are closed for this post.