Ampersands and validation

Anne van Kesteren writes about the XML entities that must be encoded in Ampersands matter. One of those entities is the ampersand, &, which needs to be written as &.

Unencoded ampersands in query strings are seen by many as “OK” to let slip through validation. That’s up to everyone to decide for themselves. However, if you decide that encoding ampersands isn’t worth it, you may want to go to the W3C document Character entity references in HTML 4 and print it out. Especially Character entity references for ISO 8859-1 characters contains a lot of variable names you need to avoid in query strings if you’re interested in having your site work in browsers like Safari, Mozilla, Firefox, IE/Mac, and OmniWeb.

All those browsers will convert an unencoded ampersand followed by a character entity reference, followed by an equals sign, to the corresponding character. That will make the query string look quite different from what you’re expecting.

Let’s say you have a link that looks like this:

<a href="index.html?myvar=1&cent=2&pound=2&reg=3">A link</a>

If you’re using one of the browsers mentioned earlier, check the status bar with your cursor positioned over this link the link in this test document, or follow the link and check the location/address field. The query string is now something your server script will have a hard time parsing properly.

Sure, most of the entity names aren’t that likely to be used as a variable name in a query string. But you never know, so by keeping that list handy you can make sure you avoid them.

Of course, an alternative would be to start encoding your ampersands ;).

Update: IE/Win behaves the same way as the browsers mentioned above. Even more reason to make sure your ampersands are encoded. Oh, and I moved the test link to a separate document, since it contains invalid and non well-formed XHTML. Not a good idea when you’re serving documents as application/xhtml+xml.

Posted on June 10, 2004 in Web Standards, (X)HTML