Ampersands and validation
Anne van Kesteren writes about the XML entities that must be encoded in Ampersands matter. One of those entities is the ampersand, &, which needs to be written as &.
Unencoded ampersands in query strings are seen by many as “OK” to let slip through validation. That’s up to everyone to decide for themselves. However, if you decide that encoding ampersands isn’t worth it, you may want to go to the W3C document Character entity references in HTML 4 and print it out. Especially Character entity references for ISO 8859-1 characters contains a lot of variable names you need to avoid in query strings if you’re interested in having your site work in browsers like Safari, Mozilla, Firefox, IE/Mac, and OmniWeb.
All those browsers will convert an unencoded ampersand followed by a character entity reference, followed by an equals sign, to the corresponding character. That will make the query string look quite different from what you’re expecting.
Let’s say you have a link that looks like this:
<a href="index.html?myvar=1¢=2£=2®=3">A link</a>
If you’re using one of the browsers mentioned earlier, check the status bar with your cursor positioned over this link the link in this test document, or follow the link and check the location/address field. The query string is now something your server script will have a hard time parsing properly.
Sure, most of the entity names aren’t that likely to be used as a variable name in a query string. But you never know, so by keeping that list handy you can make sure you avoid them.
Of course, an alternative would be to start encoding your ampersands ;).
Update: IE/Win behaves the same way as the browsers mentioned above. Even more reason to make sure your ampersands are encoded. Oh, and I moved the test link to a separate document, since it contains invalid and non well-formed XHTML. Not a good idea when you’re serving documents as application/xhtml+xml.
- Previous post: Default link styles inaccessible?
- Next post: Against Search Engine Optimisers
Information, sponsorship, and externals
About the author
Roger Johansson is a Swedish web professional specialising in web standards, accessibility, and usability. More about me and this site.
Latest articles
- Validation statistics from Nikita the Spider Comments off
- An analysis of the sites crawled by the bulk validation tool Nikita the Spider during March 2008.
- Authentic Jobs API and Affiliates program Comments off
- The Authentic Jobs job listing service now has a public API and an affiliate program.
- What does Acid3 mean to you and me? Comments off
- Opera and Apple have announced that their web browsers pass the Acid3 Browser Test, but how will that help web designers and developers?
- Designing Web Navigation (Book review) Comments off
- Learn the fundamentals of navigation design and design better navigation systems for large and small sites as well as for web based applications.
- DOMAssistant bundle for TextMate Comments off
- To save keystrokes and speed up development I have created a DOMAssistant bundle for TextMate.
- First impressions of Internet Explorer 8 Beta 1 Comments off
- My impressions after trying out Internet Explorer 8 Beta 1 for a couple of days.









Comments
I'm on Win/IE6 here, but still getting the same effect - when the world's most popular browser/OS combo falls foul of this effect, it's time to start encoding those entities!
Of course, you could not worry about it and just stick a little DOM-foolery into your onLoad event to regexp any string that starts with a question mark, end with a quote and has ampersands in...
A quick and easy solution for those times when you have to encode a lot of ampersands:
http://automaticlabs.com/products/urlcleaner
I've used this several times, it saves me from a validator headache :)
Why not just use ; to separate parameters in a query string?
if you're manually parsing a query string in a URL with Perl or something, do it however you want: split it with semicolons or pipes or whatever
but with PHP and others, it automatically parses query strings based on ampersands
You can configure PHP to use semicolons rather than ampersands easily. In fact, you can even separate args with hyphens, if that butters your bread. Just change the values of arg_separator.input and arg_separator.output in php.ini. My configuration accepts either semicolons or ampersands on input (for compatibility), and uses semicolons in generated URLs.
Sorry, comments are closed for this post.