Content negotiation, AdSense, and comments

A few weeks ago I posted about my experiences with content negotiation, and some roadblocks I ran into when looking to serve XHTML documents as “application/xhtml+xml” to capable browsers.

The problems were related to the way Google ads are displayed, and the fact that it’s very hard to make sure people use valid (or at least well-formed) XHTML in comments. Thanks to some clever people, I have found workarounds for both of these problems.

Content negotiation scripts

In Content negotiation, my previous post on this subject, I included examples of content negotiation scripts for PHP and ASP. As pointed out by Simon Jessy, the scripts did not include a Vary header. That has been fixed in the PHP version. Note that the script I included is somewhat simplified. A more advanced script can be found in Serving up XHTML with the correct MIME type.

However, I have not been able to find any information on how to send a Vary header with ASP (Classic). If there are any ASP experts reading this, perhaps you can help me with that.

Google AdSense

First, the problems with Google AdSense. The JavaScript function document.write() is used to create the HTML code used for the ads. When XHTML is served as “application/xhtml+xml”, document.write() doesn’t work, as is explained by Ian Hickson in Why document.write() doesn’t work in XML. Even if that did work, the ads are embedded in an iframe, which is not allowed in XHTML 1.0 Strict, my preferred flavour.

Simon Jessey came up with a workaround, described in detail in Making AdSense work with XHTML. In short, a separate HTML 4.01 document containing the ad code is created, and then inserted into the document where the ads are to be displayed by using an object element.

As a precaution, before implementing this method I contacted Google to check if they would have a problem with it. I didn’t want to risk my AdSense account being cancelled. After a day or so I got a reply from The Google Team, stating that they had reviewed Simon’s article and found the method acceptable. So, now there is a way to have Google ads on a page served as “application/xhtml+xml”.

Keeping comments valid

The second roadblock was comments. Comments are great, and an important part of any site of this kind. However, allowing HTML in comments makes it possible for non well-formed or invalid markup to sneak into documents. Once you serve a document as “application/xhtml+xml”, it must be well-formed, or strictly conforming browsers will throw an error instead of displaying the page.

One possibility is to disallow HTML in comments, something I have tried and didn’t like. I want people that take the time to comment to be able to add things like quotes, links, and emphasis to their comments. I looked at ways to validate comments, but couldn’t figure out how to implement them. Then I found Markdown, a Movabletype plugin by John Gruber.

Markdown uses a simple text formatting syntax, which gets converted to XHTML when the document is rebuilt. That way, comments can contain links, emphasis, blockquotes, lists, and more, even though direct HTML input is not allowed. The drawback is that anyone leaving comments will need to know some Markdown syntax unless their comment is just plain text. It’s very simple though, and I’ve added some basic instructions to the comment form. I’m giving it a try.

Serving it up

Finding these workarounds has allowed me to use content negotiation to serve my documents as “application/xhtml+xml” to browsers that can handle it, and as ‘text/html’ to all others. I’d like to keep it that way, so I’m hoping that the folks at Google don’t change their minds, and that all comments stay valid.

Update: It looks like the problems with Safari mentioned in the comments to this post aren’t actually problems at all. Safari seems to be interpreting XHTML sent as “application/xhtml+xml” very strictly, and does not recognise any named character entities except for &, <, >, ', and ".

Any others are not displayed as the expected character - they are just written to the page with no conversion.

Posted on September 1, 2004 in Search Engine Optimisation, (X)HTML, Web Standards