The perils of using XHTML properly

I’ve been using XHTML for a couple of years now, but it wasn’t until last summer that I started looking at using it properly, that is by serving it with the application/xhtml+xml MIME type. I knew about some of the problems I was going to run into, but far from all of them. As you’re about to find out, there are plenty of seemingly small issues that can make life difficult when you start using real XHTML.

Please note that this article is not an argument for or against the use of XHTML. I’m just documenting the potential pitfalls that I’m aware of, and will leave it up to you to decide what you want to use: HTML 4.01, XHTML 1.0 served as text/html to all browsers, or XHTML 1.0 served as application/xhtml+xml to browsers that handle it and as text/html to others. Or maybe something completely different.

I became aware of the gotchas one after the other, as I encountered the situations where they can occur. In some cases I had to spend quite some time looking for info and asking for help before finding a solution. But I learned from it, and I’m going to let you know what I would have liked to know when I started using XHTML.

Note that the issues I mention here only occur in user agents that properly handle the application/xhtml+xml MIME type, and therefore treat XHTML as XML. That is probably the major reason that these issues were not mentioned a lot in the early days of XHTML – very few people were using such web browsers, so almost nobody bothered to serve XHTML as anything but text/html.

Today, actually serving XHTML as application/xhtml+xml is becoming slightly more common. There are two reasons as I see it:

  1. The number of people using Firefox, Mozilla, Opera, Safari, and other XHTML capable browsers has increased a lot, so you’re not doing it just for yourself and your fellow geeks. Well, maybe you are, but it will affect many more.
  2. There is an increased awareness of what XHTML actually is among web developers. There have been several, sometimes heated, debates on the use of XHTML, especially when served as text/html. If you’ve taken part in any such discussions, you know what I mean.

If you, like me, decide to implement some kind of content negotiation and use the correct media type to deliver XHTML, you need to know what can (and will) happen to the documents you publish, and how to avoid problems. For some interesting reading on the subject of content negotiation, as well as examples of scripts that will perform the content negotiation, I’d like to refer you to Content Negotiation and Serving up XHTML with the correct MIME type. There are more articles of that kind, but those two are among the best I’ve read on the subject.

Some of the more obvious differences between HTML and XHTML are listed in every basic XHTML tutorial: use lowercase for element and attribute names, always quote attribute values, don’t use attribute minimisation, make sure all elements have end tags and that no elements are incorrectly nested, etc. However, there is more to be aware of when XHTML is being served as application/xhtml+xml.

Well-formedness is required

Documents must be well-formed XML (which is not necessarily the same as valid XHTML). No compromises, no room for error. If documents aren’t well-formed, conforming browsers (currently I’m aware of Mozilla, Firefox, Netscape, Camino, Opera, Safari, and OmniWeb – pretty much any browser but Internet Explorer) will display an error message and abort rendering the document in one way or another.

Among other things, this means no more unencoded ampersands.

The XML declaration may be required

If you use any other character encoding than UTF-8 or UTF-16, the XML declaration is required unless the encoding is provided by the HTTP header.

Whether or not character encoding should be specified in the HTTP headers is slightly unclear. Architecture of the World Wide Web, Volume One: Media Types for XML states that

In general, a representation provider SHOULD NOT specify the character encoding for XML data in protocol headers since the data is self-describing.

On the other hand, here’s what XHTML 1.0, Second Edition: Character Encoding says:

In order to portably present documents with specific character encodings, the best approach is to ensure that the web server provides the correct headers.

Either way, it’s good practice to specify the character encoding in the XML declaration:

<?xml version="1.0" encoding="iso-8859-1"?>

Only five named entities are safe

Only the five predefined named entities (&lt;, &gt;, &amp;, &quot;, and &apos;) are guaranteed to be supported. Others may be completely omitted or output literally. For example, if your XHTML document contains entities like &nbsp; or &rdquo;, that is what Safari will render. Literally. Opera instead chooses to omit the unknown entities, while the Mozilla family will recognise the entities and render them as in HTML if the document references a public identifier for which there is a mapping in the browser’s pseudo-DTD catalog and the document has not been declared standalone.

Using the UTF-8 character encoding, which is the recommended best practice, lets you use (almost) any characters you like by typing them into your document, without the need for entities or character references. If you can’t or won’t use UTF-8, numeric character references are supported and safe to use.

The contents of SGML comments may be discarded

SGML comments (HTML-style comments, <!-- comment -->) may be (and are) treated as comments by browsers, even when used inside script or style elements.

In HTML, it is common to enclose the contents of script and style blocks in comments to hide them from browsers that do not recognize script or style elements, and would render their contents as plain text on the page.

In XHTML, doing so will cause browsers to ignore anything inside the comment.

The practice of hiding the contents of script and style elements from old browsers is a habit from way back in the mid nineties. In my experience, browsers that behave this way are so rare that you can safely ignore them, and stop enclosing scripts and style sheets in SGML comments, even if you use HTML.

Contents of script and style elements are treated as XML

The style and script elements are PCDATA (parsed character data) blocks, not CDATA (character data) blocks. Because of this, anything in them that looks like XML will be parsed as XML, and cause an error unless it is well-formed.

In order to use <, &, or -- in a style or script block, you need to wrap its content in a CDATA section:

  1. <script type="text/javascript">
  2. <![CDATA[
  3. ...
  4. ]]>
  5. </script>

Inside a CDATA section, you can use any sequence of characters without it being parsed as XML (except ]]>, which ends the CDATA section).

For documents to safely be sent as text/html when necessary, the opening and closing tags of the CDATA section need to be commented out to hide them from browsers that don’t handle CDATA sections:

  1. <script type="text/javascript">
  2. // <![CDATA[
  3. ...
  4. // ]]>
  5. </script>
  1. <style type="text/css">
  2. /* <![CDATA[ */
  3. ...
  4. /* ]]> */
  5. </style>

If you want to make sure that really old browsers don’t see the contents of a CDATA section, you need to use a more complicated method, as described by Ian Hickson in Sending XHTML as text/html Considered Harmful:

  1. <script type="text/javascript">
  2. <!--//--><![CDATA[//><!--
  3. ...
  4. //--><!]]>
  5. </script>
  1. <style type="text/css">
  2. <!--/*--><![CDATA[/*><!--*/
  3. ...
  4. /*]]>*/-->
  5. </style>

An even better solution would be to let your content negotiation script remove any CDATA sections before serving the document as text/html.

A sidenote: I’ve seen Opera have problems with commented CDATA sections in XHTML. When a commented CDATA section is present within a style element, Opera (tested in 7.54 Mac) ignores the first stylesheet rule and any @import rules in the entire style element. Anyone know if this is a bug in Opera or if the behaviour can be explained by something else?

Of course, the cleanest and safest way is to move all CSS and JavaScript to external files. That’s not always practical though.

No elements are inferred

In HTML, a table’s tbody element will be inferred by the browser if it’s missing from the markup. Not so in XHTML. If you don’t explicitly add tbody, it doesn’t exist. Keep this in mind when writing CSS selectors and JavaScript.

Scripting with document.write doesn’t work

When JavaScript is used with XHTML, document.write() does not work. Ian Hickson explains why in Why document.write() doesn’t work in XML. You need to use document.createElementNS() instead. More info on that can be found in a forum thread at Experts Exchange.

This is one of the reasons that Google AdSense doesn’t work with XHTML. For those who wish to serve XHTML as application/xhtml+xml and have Google ads, there is a workaround described by Simon Jessey in Making AdSense work with XHTML. A bit messy, but it works (I’m using it here), and is approved by Google.

Referencing style elements

In XHTML, to be compatible with the XML method for defining CSS rules, you should use an XML stylesheet declaration (Called XML stylesheet declaration in XHTML 1.0, Second Edition: Referencing Style Elements when serving as XML, and xml-stylesheet processing instruction in Associating Style Sheets with XML documents.) to load an external CSS file. When using a style element, you should use an XML stylesheet declaration to reference the style element. To do this, use the id attribute for the style element to give it a fragment identifier, and then reference that in the XML stylesheet declaration:

  1. <?xml-stylesheet href=”stylesheet1.css” type=”text/css”?>
  2. <?xml-stylesheet href=”#stylesheet2” type=”text/css”?>
  3. <!DOCTYPE html
  4. PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN”
  5. “http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd”>
  6. <html xmlns=”http://www.w3.org/1999/xhtml” xml:lang=”en” lang=”en”>
  7. <head>
  8. <title>XML stylesheet declaration</title>
  9. <style type=”text/css” id=”stylesheet2”>
  10. @import “stylesheet2.css”;
  11. </style>
  12. </head>

I’m not sure how much of a requirement this actually is, and if there are any problems associated with not using XML stylesheet declarations. Maybe someone can enlighten me.

CSS is applied slightly differently

CSS properties applied to the body element don’t apply to the whole viewport in XHTML. This is most notable when a background colour or image is applied. In HTML, a background applied to the body element will cover the entire page. In XHTML, you need to style the html element as well. There is a demonstration of this behaviour in CSS body Element Test at Juicy Studio.

Element and attribute names in CSS rules are case sensitive in XHTML (and must be lowercase). It’s very simple to avoid problems with case-sensitivity: just make sure all element names, attribute names, and selectors are lowercase, whether you’re writing HTML, XHTML, or CSS.

Challenging, but not impossible

When I decided to start serving XHTML as application/xhtml+xml to capable browsers, it would have saved me some headaches if I had known about an article like this to read before making that decision. I might even have considered using HTML 4.01 Strict instead. Nevertheless, I’ve learned from the experience, and learning is always good.

Hopefully this has provided you with a bit more information about what it actually means to use XHTML properly, and you can make a slightly more informed decision as to whether you want to go that way or not.

There are probably even more differences between HTML and XHTML than those I’ve mentioned here, so feel free to add any additional pitfalls you’ve encountered when serving XHTML as application/xhtml+xml. If you can spot any errors or omissions, do tell.

Update: Some clarifications and links to references added.

Translations

This article has been translated into the following languages:

Posted on January 18, 2005 in (X)HTML, CSS, JavaScript, Web Standards

Comments

  1. Element and attribute names in CSS rules are case sensitive in XHTML (and must be lowercase).

    Yes!!!! I discovered this the hard way. I could pull up the same page in IE and Firefox and one was using the proper CSS styles (IE, believe it or not) while the other ignored them… but only parts of them. Seems I was inconsistent in my referencing of elements and attributes in CSS so that some worked while others didn’t.

    That was the most frustrating day I had ever spent working with CSS and XHTML and just about gave up on XHTML completely until I made the discovery of case-sensitivity. I’m much more careful now… I will definitely bookmark this list as I’m sure I’ve broken most of the rules in this list.

  2. Hmm, I was just sitting and reading your articles about content negotiation and serving XHTML as application/xhtml+xml. I then hit your homepage and found this new article. Amzing, you are writing like crazy, keep it up ;D

  3. I opened your page to find one of those links mentioned above, because I finally encouraged myself to serve xhtml properly. And guess what - after five seconds I am reading this stuff:) Almoust telepathic I must say:)

    But… There is always a but:) After reading about these pitfalls, I am not so shure about xhtml+xml thing anymore. Can someone tell me about pros of serving xhtml correctly? I suppose cons are mentioned above.

  4. January 18, 2005 by Erwin Heiser

    I’m sweating from just reading that ;-) One point I miss though: any advantages to all this?

  5. I’ve got a question about “Making AdSense work with XHTML.”:

    Since that method effectively puts the AdSense iframe in an external object, wouldn’t AdSense use the external object as a source to scan when deciding which ads to display? (The net result being that you get general ads or public service ads)

  6. Nice article! But a rather strange detail… When I´m viewing your source with Safari, the DTD served is HTML 4.01 Strict and not XHTML?? What I´m missing?

  7. I think it’s useful to remember that XHTML 1.0 Strict CAN be served as text/html but if you want to move to XHTML 1.1, you’ll need the application/xhtml+xml content type.

    On my sites, I serve XHTML 1.0 Strict with text/html and I had no problems. It seems to me the best way to go for the moment. Once XML support is widespread, I’ll love to use it, since I’m a programmer and I like everyting settled, documented and STRICT 8^)

  8. January 18, 2005 by Eric TF Bat

    For very detailed comments on the why of application/xhtml+xml, go see Anne van Kesteren’s blog; he (Anne is male) appears to be just about the expert in the web-pundit blogosphere on this stuff, although Roger’s giving him a run for his money…

    BTW Roger: I love the very minimalist styling of the input fields in your comment form - and given your proclivities, I can be confident it works on all serious browsers. Pardon me while I stea^H^H^H^H learn from your idea…

  9. January 18, 2005 by Roger Johansson (Author comment)

    ns and Erwin: Right. That is the question many are asking. In most cases, there really aren’t any advantages. ;-)

    Arve: I though so too at first, but apparently that is not what happens, since the ads here differ depending on the content of the document they are displayed in.

    B: That’s because I’m using content negotiation to convert XHTML to HTML before sending it as text/html to Safari. Safari does handle XHTML, but it doesn’t specifically say so in its HTTP accept header. Since the script doesn’t do any browser sniffing, Safari ends up getting HTML.

  10. January 18, 2005 by Roger Johansson (Author comment)

    Eric: Well, Anne no doubt knows this particular stuff better than I do ;-)

    And help yourself to the form styling. I’m sure I was inspired by someone else.

  11. Excellent article!

    Note: what you have found also applies to server-side XML parsing, and XML/XSLT in general. …and yes, that also includes all of Microsoft’s XML parsers.

    If you for any reason do not want to use XML on the client-side just yet, server-side parsing is an excellent choice. You get the benefit of using XML + the availability of (X)HTML.

  12. Apparently, Internet Explorer can support the correct MIME type for XHTML documents…

    XHTML MIME type in Internet Explorer

    …I haven’t tested it personally, but if that’s all it takes, then wouldn’t a small Internet Explorer “patch” via Windows Update be all that’s required from Microsoft for all it’s users?

  13. Although you do not address the most important aspects of why XHTML is so useful for you (it is not) and how you make sure every post validates by using XML tools (you do not) there is a little mistake in the end.

    The BODY element does not cover the whole canvas in HTML either. Give it a border and see. The difference is that the ‘background’ property value is no longer applied from the BODY element to the canvas. Same thing for the ‘overflow’ property value which may be re-applied to the viewport from either the HTML or BODY elements in HTML, but it will not be re-applied in XML documents.

    Also note that the code you are “advocating” is incorrect thanks to some script that does not take care of blocks of code.

    Furthermore I am a bit disappointed that you did not link to at least one article that disapproves XHTML and lists its many disadvantages. For example that almost every browsers where XHTML can be displayed does not display XML incrementally. Et cetera.

  14. Re: dean, comment 12. That does not fix the problem. It actually makes the whole XHTML situation worse than it already is. Since Internet Explorer will parse those documents as text/html. This means it will parse it with the “tag soup parser”, not with the XML one. (Although the XML one triggers quirks mode, et cetera.) Fortunately it will not list it in the accept header if you make that change so sites that are doing content negotiation will still serve HTML to Internet Explorer. (Although those sites should not be really using XHTML. I know my site should get an update to HTML someday, thanks.)

  15. January 19, 2005 by stylo~

    This article might be called, “Why you should never bother with xhtml,” no? Zero advantages over well-formed html, but massive headaches.

    The only thing that interests me about xhtml any more, which I haven’t seen documented yet anywhere, though often heard claimed on its behalf, is whether or not xhtml sent as application/xhtml+xml will truly render faster than that sent as tag soup. If anyone has the ability to test that accurately, it would make for a great article.

  16. Nice job, RJ. Thanks for saving a lot of us the time of figuring this stuff out on our own…nice to have a good, overall summary of these pitfalls.

  17. January 19, 2005 by Isaac Lin

    I think the killer reason for using XHTML may be the increasing popularity of XMLHttpRequest, largely due to its use in Gmail and Google Suggest. If your web pages are in XML format, then you can easily include them within other pages, without using frames, or server-side technology like server side includes, PHP, or CGI scripting.

    Although numerous blogs have already commented on the impact XMLHttpRequest can have on web applications (thanks to its ability to dynamically change the current page without the appearance of a full page reload), I think it can also open up new ways for the average web page author to organize their web pages into separate components, and to include the contents of external sites into their own, without having to rely on server support for anything other than the ability to serve XML.

  18. Welcome on the bandwagon, Roger. :)

    A very well-written analysis of the problem, as usual.

    BTW, you’ve got a small typo in there: SGML comments use two hyphens: <!— comment —>. It’s actually the double hyphens that delimit the comment, but comments are only allowed in SGML directives (if that’s the proper name for it), i.e. things that start with <!. See, for instance, the HTML 4.01 DTD, where this is used quite extensively.

    (BTW 2, my &lt;’s were changed back into <’s by the obligatory preview. This makes it unnecessarily hard to write markup examples. :))

  19. And it seems as if your comment script converts my double hyphens into an em dash or something. :(

  20. January 19, 2005 by Roger Johansson (Author comment)

    Tommy: Sorry about the commenting problem. I’ve been trying to fix the unescaping of entities for a long time now, but I can’t figure it out.

    Your double hyphens were converted to em dashes by the SmartyPants plugin, trying to take care of typography. That’s also what caused the SGML comments in the article to be displayed incorrectly. Fixed now.

    Anne: Thanks for clarifying the body element magic. I’ll update the article later today.

    No, I didn’t comment on why XHTML is useful to me. As you say, it really isn’t, other than for learning purposes, where it’s been quite useful. :)

    What code is incorrect, btw?

    No, I did not link to any articles that disapprove or approve of XHTML. This article is just meant to provide information on the pitfalls involved.

  21. Wonderful article, though it’s not entirely accurate.

    XML declaration and Character Encodings

    As stated in Architecture of the World Wide Web, Volume One: Media Types for XML:

    In general, a representation provider SHOULD NOT specify the character encoding for XML data in protocol headers since the data is self-describing.

    That means that the charset parameter should be omitted from the Content-Type header field in the HTTP response headers for XML documents, served as xml.

    Script and Style CDATA Sections

    As stated in Ian Hickson’s Sending XHTML as text/html Considered Harmful The correct way to hide CDATA sections from HTML UAs, but not from XML UAs is to use constructs like these, not just comment them out as you have done:

    <script type=”text/javascript”>
    <!—//—><![CDATA[//><!—

    //—><!]]>
    </script>

    <style type=”text/css”>
    <!—/*—><![CDATA[/*><!—*/

    /*]]>*/—>
    </style>

    xml-stylesheet

    xml-stylesheet is, unlike the XML Declaration, a processing instruction, not a declaration.

  22. January 19, 2005 by Roger Johansson (Author comment)

    Lachlan: Thanks for pointing out those inaccuracies. I’ll take a closer look at it later (gotta run now), but it seems that there is conflicting information on character encoding and XML stylesheets.

    The CDATA section commenting you mention only needs to be used when you also want to hide the contents of the CDATA section from HTML UAs, right? If all you want to do is hide the CDATA section delimiters, my example should do. From Ian Hickson’s article:

    This is all assuming you want your pages to work with older browsers as well as XHTML browsers. If you only care about XHTML and HTML4 browsers, you can make it a bit simpler.

    Oh, please note that the code posted in Lachlan’s comment is a bit whacked right now, because of my commenting scripts. I’ll look into it later. Any em dashes in his code should be double hyphens.

  23. The body element does not fill the whole viewport in XHTML.

    Rather than styling the <html> element, which cannot take class or id attributes (thus complicating putting different background styles on different parts of a site), you can instead just make the <body> element act like it always looked like it did and make it fill the viewport:-

    body {position: absolute; top: 0px; width: 100%; min-height: 100%}

  24. Great post, Roger. Great posts coming from your desk is hardly surprising anymore.

    I had been talking a lot about XHTML on my blog too, but only recently did I start actually using it in my projects. Sure enough, as Anne said above, it’s probably not worth the effort coding in XHTML.

    A probable use of XHTML is where one website/software can grab content from another website easily and display it in theirs, or probably use the content in some other way, using a XML parser. Though this is entirely possible with HTML, it’s much easier with XHTML, since the XHTML document is “parser-ready”. This is definitely an advantage with XHTML, but I don’t think it’s great enough to bother with serving docs as XHTML over ill-formed HTML.

    I still think posts like yours are very useful. After all, it took XHTML to teach us to use HTML properly. XHTML is definitely the way to go if you want to learn markup, and posts like yours go a long way in teaching us how. HTML will still be used for production, though.

    Issac Lin (#17): Correct me if I am mistaken, but XMLHTTPRequest has nothing to do with XHTML, or even XML. In fact, in some cases, it might be better to send data as JavaScript through XMLHTTPRequest.

    The xml-stylesheet thing is new to me. I’ll be digging deeper into that. Thanks for the pointer.

  25. January 19, 2005 by Roger Johansson (Author comment)

    Lachlan:
    In Referencing Style Elements when serving as XML, the phrase “XML stylesheet declaration” is used.

    From the same document, but in the section on Character Encoding: “In order to portably present documents with specific character encodings, the best approach is to ensure that the web server provides the correct headers.”

    Yes, those references are from an “informative” appendix, so the documents you are referring to may well have the correct info.

  26. Very interesting article, and comments too.

    I think the point in question is really, should we be forced to server up incorrect content types?

    The fact that xhtml can be served up as text/html to me is a bit of a joke. Imagine if we could serve up JavaScript as vbscript…it just doesn’t make much sense.

    But alas, until certain browsers ahem, are natively capable of accepting the correct content type, were stuck with either using content negotiation or serving up an incorrect content type.

    —If it’s ok for me to add —

    I’m running the webstandards channel on dalnet for discussions on all of the above. If you’re interested, feel free to join (#webstandards,irc.dal.net) — . —

  27. Jay-Dee, most correct thing to do is switching to a format that is supported by every browser, HTML 4.01.

  28. Nice write-up. I’ll toss in some more JavaScript related notes here now.

    As has already been stated, node names are lowercase in XHTML. Therefore, if you want to write a script which is compatible with HTML and XHTML, you have to use something this (depending on the user case):

    if(node.nodeName.toLowerCase() == "p"){
    

    As for using XHTML to import new structure via XMLHTTPRequest, that’s a bit spiffy as well. You have to import those nodes into your current document, which isn’t supported by IE. I wiped up a solution for this almost a year ago, you can find it here. An easier solution would be to use innerHTML, but of course that doesn’t (well, shouldn’t, right, Opera and Safari?) work for XML pages. So, in the end, you’re better off using text/plain in your XMLHTTPRequests, as Rakesh pointed out in his article.

    Also note that Opera versions below 7.5 do not support scripting in XML mode. Not sure about Safari, though.

  29. It’s been a while since I knew that XHTML should be sent as application/xhtml+xml.

    But lately, I’ve come to understand that the world isn’t ready for XHTML yet.

    I’ve used content-neogation for about one year over at my site, but here the other day, I almost converted back to plain old HTML 4.01. The only thing that stopped me was that it became to much hazzle converting every blog-post back to HTML-syntax.

    Why did I almost go back, you wonder? As long as I’ll have to use a content-neogation-technique to serve alternative MIME-types for distinct browsers, it’s just not worth it.

    That’s not how it was ment to be, and we all know it.

  30. January 19, 2005 by Isaac Lin

    Rakesh Pai: I am sure that a more compact format would be more suitable for web applications. However, I think it could be beneficial if every web page could be viewed independently (not just as a text file or Javascript code, but as a fully marked-up document), and also be retrievable and parsable via the DOM API, for inclusion in other web pages or for other forms of manipulation. The parsability guarantee is easier to satisfy if (big if) the web page validates as a valid XML document. (This is of course the big fly in the ointment.)

    Mark: (I assume by text/plain you don’t actually mean a text/plain MIME type but that the reply is just plain text.) I misunderstood what Google was doing — I had assumed that Google was taking the contents of an element from a returned XML document and executing it, but looking at the decompressed code provided by Chris Justus, it seems that the reply is just plain text (in spite of presumably having an XML content type).

    I did not know about the problem with importing nodes. However, now that you have kindly provided the appropriate code, I think it is easier for someone to maintain XHTML copies of their web pages that can be retrieved, manipulated, and included into other pages, and not have shadow text versions for each.

  31. Roger: A Great post as always. Learn something new from you everyday!

    Rakesh: Good comments. I found your post on XMLHttpRequest to be one of the few worthwhile reads on the subject out there.

    It’s nice to see a thoughtful validation beyond all the “OMGWTF!!!! I DON’T HAVE TO REFRESH?!?!?!” entries my RSS reader is clogged up with lately.

    I’m surprised there have been so few comments on your post. Hopefully my take will will help stir some healthy discussion.

  32. Roger: For some reason (perhaps my proxy at work) I receive a “403 Forbidden” error when trying to post with Firefox. I don’t have referrer blocking enabled. Just FYI.

  33. Hi, Nice article…I have a few newbie type questions that i hope someone can help me out with.. I have been a print designer for a few years and recently within the past year or so am learning web design. I started out using Dreamweaver but am learning coding by hand now. I just got a book helping me learn xhtml. Just a basic book showing me how to transition from html to xhtml with al the syntax rules I should follow.

    I know xhtml has been around for a little while but I figured it was the future of web design and I should start using it now. So, I’m in the process of redesigning my personal site using xhtml.

    I saw in the article how he wrote about IE having problems with xhtml. Does this mean IE will totally screw up the design? Because, obviously IE is still the dominant browser and I would hate to not have a site available to to 70% or whatever the percentage of people surfing using IE is.

    I did notice one thing wrong happening in IE when I tested out my redesigned index page. I have 6 links on the left side set as block level. The width of the div is about 200px and I have a hover with a border to the right of the text links that comes up when user rolls over. Works fine in Firefox/Safari but in IE the hover/rollover border stretches almost the entire width of the page..not just the 200px. This might not be an xhtml problem. Just an IE problem?

    Sorry so long..I’ll post this same question in a forum somewhere too.

    Thanks

  34. January 20, 2005 by Daniel Morrison

    Wow. I just built a small site for a client and wrote it in valid XHTML 1.1. I just updated it to correctly serve xhtml+xml (which was on my to-do list) and after fixing the issues in this article, I feel a great sense of accomplishment. The challenge will be to keep it valid as the site grows!

    All I can say to those wondering the merits of strict code is: Try it on a new site. Its a great exercise, and will be useful later. If you already have a big, complex site that works, you’ll probably want to leave it as is. But the more people learn to implement strict code, the quicker and easier standards adoption will be.

  35. Ross, if you serve your XHTML markup with an application/xhtml+xml doctype, IE won’t know what to do with it. It won’t try to render it and screw up the design, it will try to download the file. It’s not a question of IE “not handling it quite right,” it’s a question of IE not handling it at all. This is why I and quite a few others are saying that XHTML is currently pointless, except possibly for special audiences where you can actually require an XHTML-compliant browser.

    In my opinion, it’s much better to learn how to work with a strict doctype (e.g. HTML 4.01 Strict) than to pretend using XHTML (especially XHTML 1.0 Transitional).

  36. Steve Gilham commented:

    Rather than styling the ‘html’ element, which cannot take class or id attributes (thus complicating putting different background styles on different parts of a site), you can instead just make the ‘body’ element act like it always looked like it did and make it fill the viewport:-

    body {position: absolute; top: 0px; width: 100%; min-height: 100%}

    Firstly, one of the XHTML doctypes does allow for ID to be used on the HTML tag. I don’t know why the others don’t.

    Secondly, why not just use this code?

    body {margin:0; padding:0;}

  37. Isaac: text/plain or just text should be the same, right?

  38. Thank you for giving me plenty of more reasons to never consider coding sites this way :)

  39. January 20, 2005 by Roger Johansson (Author comment)

    Mike D: You’re welcome ;-)

    Chris: The html element is allowed to have an id attribute in XHTML 1.0 Strict/Transitional/Frameset according to the HTML/XHTML Element/Attribute support table and the W3C markup validator.

    Tommy: Right, Strict is the way to go, whether it’s HTML or XHTML.

  40. They added the id attribute for the html after the first version of the XHTML 1.0 spec. For some reason, they didn’t update XHTML 1.1 to allow it, though. :(

  41. Chris: That won’t work because it won’t make the bdoy expand to the full height of the viewport if the content is shorter.

    Someone mentioned it in a side note, but the innerHTML property does not work in XML mode in Mozilla. For the same reason that document.write() doesn’t work.

    Another bug to consider when using XHTML is the broken copying mechanism in Mozilla. Which is: Mozilla won’t copy text from a CDATA section to the clipboard. While I fixed it for XHTML served as application/xml, XHTML being served as application/xhtml+xml takes yet another code path and doesn’t work. Pretty embarassing, seeing as these CDATA sections are the main reason why I use XHTML.

    Let me explain. My university server provides me 100MB of unscripted web space. So, I decided to put a C++ tutorial there. The many angle brackets and ampersands that invariably occur there were very tiresome to write in a plain text editor, so I decided to use CDATA sections for the code examples. Of course, they only work in XHTML, so…

    An added benefit is that it effectively locks IE out, which I don’t mind doing at all.

    Aside from that, I agree with Anne: for pages that should be visible in IE, use HTML 4.01. I can’t see a single reason to use XHTML.

  42. January 21, 2005 by Isaac Lin

    Mark: My understanding from what I’ve read was that the content type returned by the server for a document requested via XMLHttpRequest must be an XML type, and not text/plain. When I tried it out on my own web pages I did not try to serve a non-XML content type, though, so I’m not sure what happens with the various implementations.

  43. My site has been XHTML 1.0 Transitional served as text/html for almost two years now. I have been wishing to upgrade to Strict served properly, but I have been afraid after reading many horror stories. This list of tips and facts regarding the conversion has been very helpful to me. I can’t promise that I’ll make the change any time soon, but you have certainly helped me feel confident that it won’t be as bad as I had previously thought. :)

  44. This was highly informative and something I’m going to have to read through a couple more times to make sure I’ve taken in all the information. Great article!

  45. Maybe I’m missing something, but I’ve never seen the big deal about writing a site in XHTML. XHTML 1.0 and HTML 4.01 are essentially the same language; the abstract says, “The semantics of the elements and their attributes are defined in the W3C Recommendation for HTML 4.”

    I consider the requirement for well-formed pages to be a great thing. If I forget to close a tag, I know immediately from the “yellow screen of death.”

    And I believe an XML stylesheet declaration would only be necessary if the page was being viewed in an XML viewer. But since web browsers are of course (X)HTML viewers, I don’t think it’s needed.

    I wasn’t aware of the limitation on named entities though. The DTD for XHTML specifies the full list of entities, why would they not be acceptable? Sure that’s a limitation of XML, but XHTML is an application of XML, with its own specifications, isn’t it?

  46. On the “CSS is applied slightly differently” section, perhaps it should made explicit that attribute values can be mixed case.

    Otherwise when you say “just keep everything lowercase” it might confuse some people (it did confuse me for a couple of minutes!)

  47. The reason i like using XHTML better is from a designers point of view, it just looks better. When i started to make websites in HTML4 i was making a tag soup. Conforming to webstandards i nice and structured, a way to achieve perfection. From that point of view there is nothing wrong. If it’s my personal site, i don’t care if the rest of world on IE can’t see it. But only on my personal site.

    It’s just like i never capitalize my “i“‘s in a sentence, only when i start a sentence. To me it just looks better when they are dotted. Some say that’s very arrogant, but if that where true i would probably have bold caps of my I’s. ;)

    Roger what i am curious about is that you didn’t use xml-stylesheet declaration here. Is there a reason for that? This article really gave me a better understanding of XHTML. So Thanks!

  48. January 26, 2005 by Roger Johansson (Author comment)

    Michael: As far as I can tell, other named entities are allowed, but XML parsers aren’t required to support them.

    Manuel: Thanks. I’ll fix that.

    Med: I don’t use xml-stylesheet here because I haven’t got around to changing that yet ;-) It’s on the list…

  49. Concerning your note about , I’ve dug up a reference that might answer your question. Of course, you should keep in mind that Anne has very particular views on XHTML which you may or may not agree with.

    From Anne’s Weblog about Markup and Style

    …both sending as application/xhtml+xml and the use of the XML stylesheet PI are specified with SHOULD. Does this mean that if you use one you have to use the other? I think so, unless you have a really good reason not to.

    He refers to this W3C Note

    I’ve done so in the past, incorporating the stylesheet calls into my content negotiation. Currently though, it’s proving to be a hassle. I’ll get around to correcting it later. Hope to have been enlightening.

  50. One advantage of XHTML is that you can use an XHTML schema validator to check your pages more thoroughly (for example: attribute values) than with normal HTML validators as from W3C or WDG.

    Another possibility to serve your documents is to have two files side by side: An HTML file, served as text/html for legacy browsers, an XHTML file (might be 1.0 Strict or 1.1, no HTML Compatibiliy Guidelines needed) served as application/xhtml+xml. The HTML file might be generated from the XHTML version.

    The XHTML schema validation might be a benefit. But for a good author with good coding practice it should be no problem at all to code a valid HTML page; without need for a schema validator.

    Thumbs up for HTML.

  51. I wasn’t aware of the limitation on named entities though. The DTD for XHTML specifies the full list of entities, why would they not be acceptable? Sure that’s a limitation of XML, but XHTML is an application of XML, with its own specifications, isn’t it?

    It’s probably related to the fact that XML documents need only be well-formed, and not valid: XML parsers are not required to read the DTD at all:

    Non-validating processors are REQUIRED to check only the document entity, including the entire internal DTD subset, for well-formedness.

    Another possibility to serve your documents is to have two files side by side: An HTML file, served as text/html for legacy browsers, an XHTML file (might be 1.0 Strict or 1.1, no HTML Compatibiliy Guidelines needed) served as application/xhtml+xml. The HTML file might be generated from the XHTML version.

    This is (pretty much) what I used on a Website I coded. The pages are stored as XHTML 1.1 files, and the content negotiation script either served them up directly as application/xhtml+xml to browsers which advertise support for it, or text/html with a few simple regex substitutions to make it a 100% valid HTML 4.01 document.

  52. Argh. It turns out that you can’t seperate two blockquotes with just a link “definition”, so to speak. The bit starting “Another possibility…” was supposed to be a second blockquote.

  53. February 9, 2005 by Roger Johansson (Author comment)

    Dave: Thanks for digging up that info. I’m looking at changing my content negotiation script to use XML stylesheet processing instructions. I need to make some changes to some of my documents first though — not all of them use the global stylesheet, and if I just edit the content negotiation script, the CSS would be applied to all documents in XHTML browsers.

    Lars: I wasn’t aware of that validator. Thanks.

    Shiv: Yeah, sorry about that. I spent 15 minutes trying to fix it, but Markdown does not seem to allow several consecutive block quotes.

  54. February 10, 2005 by Can't get my head around this...

    Ok, I’ve been wrestling with this all day. I hope you guys don’t mind me asking, but you seem like authorities on the subject.

    I’ve got an XML doc being loaded via an XMLHttpRequest object in Mozilla. It has the following structure:

    123

    In my code, “doc” contains the XMLHttpRequest.responseXML .

    doc.documentElement.nodeName

    …returns “a”. No problem there.

    doc.documentElement.childNodes[0].nodeName

    …returns “#text”, which I don’t understand. I would think it would return “b”. However:

    doc.documentElement.childNodes[1].nodeName

    …DOES return “b”. And:

    doc.documentElement.childNodes[1].childNodes[1].nodeName

    …returns “c”. So how can I get the “123” value of the “c” node? Neither of these work:

    doc.documentElement.childNodes[1].childNodes[1].nodeValue

    doc.documentElement.childNodes[1].childNodes[1].innerHTML

    Anybody care to point me in the right direction? And possibly explain why my childNode[] arrays are acting strange?

  55. Arg… preview killed my escaped characters I think. Just to be safe: assume “-” is the appropriate lt/gt char:

    -a- -b- -c-123-/c- -/b- -/a-

  56. Interresting artikle. I started doing some changes to my personal site right away. And I’ve run into problems. First of all

    Doesn’t include the stylesheet in Firefox, only in Opera. I can’t even view the page in IE since it can’t take application/xhtml+xml, and starts to download the page instead of rendering it. How should I write?

    Also your links to the php header changer isn’t really what I need since I use ASP .Net, for good and for bad. How do I do it in ASP .Net?

    And, thanks for a good article!

  57. I think what you want is actually:

    doc.documentElement.childNodes[1].childNodes[1].childNodes[1].nodeValue

    Sections of text in a document are themselves nodes.

  58. April 27, 2005 by Paul Arzul

    attribute values are case sensitive in xhtml too (xml is case sensitive).

    pay special attention to enumerated attributes — attributes with fixed predefined values:

    <input type="Checkbox" />

    is not valid xhtml.

  59. Very interesting article. I also prefer to serve sites as application/xhtml+xml. But today i realized that when contained in a CDATA element, all JavaScript code doesn’t work as they use to, anyway in FireFox.

    What i found was, when i use the following code in FireFox and IE served as text/html it works! But the same code when served as application/xhtml+xml, it doesn’t work in FireFox but still works on IE. Could this be a bug in FireFox?


    <div id="replace"></div>
    <script type="text/javascript">
    <!—//—><![CDATA[//><!—
    document.getElementById(‘replace’).innerHTML = ‘some text’;
    //—><!]]>
    </script>
  60. May 7, 2005 by Roger Johansson (Author comment)

    Venushka: Support for innerHTML in XHTML varies a bit between browsers, and it probably shouldn’t be supported in documents served as application/xhtml+xml. More info: XHTML friendly Javascript for Flash (comment #3), innerHTML in XHTML pages.

  61. Any developer who has used XSLT to output X/HTML knows about many of the points you’ve highlited. As others have said, great article.

  62. December 14, 2005 by Luke Matuszewski

    If HTML browsers would had really SGML parsers it would be impossible to write XHTML(for browsers with XML parsers) which is valid in HTML browsers(with SGML parser). Why ?

    Here is one of many examples:

    <br />

    this <br /> parsed be SGML parser would be translated to <br>> - so you would see the > character after the <br> on site which is bad and not intended.

    Another thing is browsers with soup parsers which will (probably) parse <br /> as only <br>.

    So really you have to choose HTML or XHTML… because XHTML for SGML applications will break;

  63. February 9, 2006 by Patrick Kano

    CSS is applied slightly differently - CSS properties applied to the body element don’t apply to the whole viewport in XHTML.

    Sorry, but where can we get more information on this? Thanks!

  64. May 3, 2006 by Henrik

    Use XHTML if you want your site to be browsable on a mobile phone.

Comments are disabled for this post (read why), but if you have spotted an error or have additional info that you think should be in this post, feel free to contact me.