Ampersands and validation

Anne van Kesteren writes about the XML entities that must be encoded in Ampersands matter. One of those entities is the ampersand, &, which needs to be written as &.

Unencoded ampersands in query strings are seen by many as “OK” to let slip through validation. That’s up to everyone to decide for themselves. However, if you decide that encoding ampersands isn’t worth it, you may want to go to the W3C document Character entity references in HTML 4 and print it out. Especially Character entity references for ISO 8859-1 characters contains a lot of variable names you need to avoid in query strings if you’re interested in having your site work in browsers like Safari, Mozilla, Firefox, IE/Mac, and OmniWeb.

All those browsers will convert an unencoded ampersand followed by a character entity reference, followed by an equals sign, to the corresponding character. That will make the query string look quite different from what you’re expecting.

Let’s say you have a link that looks like this:

<a href="index.html?myvar=1&cent=2&pound=2&reg=3">A link</a>

If you’re using one of the browsers mentioned earlier, check the status bar with your cursor positioned over this link the link in this test document, or follow the link and check the location/address field. The query string is now something your server script will have a hard time parsing properly.

Sure, most of the entity names aren’t that likely to be used as a variable name in a query string. But you never know, so by keeping that list handy you can make sure you avoid them.

Of course, an alternative would be to start encoding your ampersands ;).

Update: IE/Win behaves the same way as the browsers mentioned above. Even more reason to make sure your ampersands are encoded. Oh, and I moved the test link to a separate document, since it contains invalid and non well-formed XHTML. Not a good idea when you’re serving documents as application/xhtml+xml.

Posted on June 10, 2004 in (X)HTML, Web Standards

Comments

  1. I’m on Win/IE6 here, but still getting the same effect - when the world’s most popular browser/OS combo falls foul of this effect, it’s time to start encoding those entities!

    Of course, you could not worry about it and just stick a little DOM-foolery into your onLoad event to regexp any string that starts with a question mark, end with a quote and has ampersands in…

  2. A quick and easy solution for those times when you have to encode a lot of ampersands:

    http://automaticlabs.com/products/urlcleaner

    I’ve used this several times, it saves me from a validator headache :)

  3. Why not just use ; to separate parameters in a query string?

  4. October 12, 2004 by Emily

    if you’re manually parsing a query string in a URL with Perl or something, do it however you want: split it with semicolons or pipes or whatever

    but with PHP and others, it automatically parses query strings based on ampersands

  5. October 13, 2004 by storlek

    You can configure PHP to use semicolons rather than ampersands easily. In fact, you can even separate args with hyphens, if that butters your bread. Just change the values of arg_separator.input and arg_separator.output in php.ini. My configuration accepts either semicolons or ampersands on input (for compatibility), and uses semicolons in generated URLs.

Comments are disabled for this post (read why), but if you have spotted an error or have additional info that you think should be in this post, feel free to contact me.