Ampersands and validation
Anne van Kesteren writes about the XML entities that must be encoded in Ampersands matter. One of those entities is the ampersand, &, which needs to be written as &.
Unencoded ampersands in query strings are seen by many as “OK” to let slip through validation. That’s up to everyone to decide for themselves. However, if you decide that encoding ampersands isn’t worth it, you may want to go to the W3C document Character entity references in HTML 4 and print it out. Especially Character entity references for ISO 8859-1 characters contains a lot of variable names you need to avoid in query strings if you’re interested in having your site work in browsers like Safari, Mozilla, Firefox, IE/Mac, and OmniWeb.
All those browsers will convert an unencoded ampersand followed by a character entity reference, followed by an equals sign, to the corresponding character. That will make the query string look quite different from what you’re expecting.
Let’s say you have a link that looks like this:
<a href="index.html?myvar=1¢=2£=2®=3">A link</a>
If you’re using one of the browsers mentioned earlier, check the status bar with your cursor positioned over this link the link in this test document, or follow the link and check the location/address field. The query string is now something your server script will have a hard time parsing properly.
Sure, most of the entity names aren’t that likely to be used as a variable name in a query string. But you never know, so by keeping that list handy you can make sure you avoid them.
Of course, an alternative would be to start encoding your ampersands ;).
Update: IE/Win behaves the same way as the browsers mentioned above. Even more reason to make sure your ampersands are encoded. Oh, and I moved the test link to a separate document, since it contains invalid and non well-formed XHTML. Not a good idea when you’re serving documents as application/xhtml+xml.
- Previous post: Default link styles inaccessible?
- Next post: Against Search Engine Optimisers
Sponsors
Authentic Jobs
DreamHost web hosting
Use the promo code 456BEREASTREET3 to save USD 20 when you sign up for DreamHost


Comments
I’m on Win/IE6 here, but still getting the same effect - when the world’s most popular browser/OS combo falls foul of this effect, it’s time to start encoding those entities!
Of course, you could not worry about it and just stick a little DOM-foolery into your onLoad event to regexp any string that starts with a question mark, end with a quote and has ampersands in…
A quick and easy solution for those times when you have to encode a lot of ampersands:
http://automaticlabs.com/products/urlcleaner
I’ve used this several times, it saves me from a validator headache :)
Why not just use ; to separate parameters in a query string?
if you’re manually parsing a query string in a URL with Perl or something, do it however you want: split it with semicolons or pipes or whatever
but with PHP and others, it automatically parses query strings based on ampersands
You can configure PHP to use semicolons rather than ampersands easily. In fact, you can even separate args with hyphens, if that butters your bread. Just change the values of arg_separator.input and arg_separator.output in php.ini. My configuration accepts either semicolons or ampersands on input (for compatibility), and uses semicolons in generated URLs.
Comments are disabled for this post, but if you have spotted an error or have additional info that you think should be in this post, feel free to contact me.