HTML vs. XHTML on standards compliant websites

For the last few years there’s been a recurring debate on whether we should use XHTML vs. HTML among those of us who care enough about markup to ask ourselves such questions.

Sean Fraser has taken a look at fifty standards compliant websites to find out which doctype they use. He presents the results (which, not surprisingly, show that the vast majority of the examined sites use XHTML) along with a discussion in Why XHTML™?.

Sean concludes the article by stating his reasons for currently using HTML 4.01 Strict:

  • XHTML 1.0 is not forward compatible; XHTML 2.0 will not be backwards compatible.
  • Serving XHTML as application/xhtml+xml does’t (sic) work in IE.
  • HTML 5 purports backwards compatibility.

Only three of the fifty sites Sean examined use HTML 4.01 Strict. This site is one of those three. In the end, my view on HTML vs. XHTML is that it doesn’t really matter. Just remember to write your HTML or HTML compatible XHTML with real XHTML in mind. And use a strict doctype.

Posted on June 19, 2006 in (X)HTML, Quicklinks, Web Standards

Comments

  1. Thanks. :)

    I guess this answers my email.

  2. His numbers don’t seem entirely correct. First, 25, 31 and 42 have two separate entries. Second, I got these numbers:

    • XHTML 1.0 Strict: 21
    • XHTML 1.0 Transitional: 21
    • XHTML 1.1: 3
    • HTML 4.01 Strict: 4 (he skipped my weblog there for some reason and didn’t classify mine as a separate entry later)
    • HTML 4.01 Transitional: 3
    • No DOCTYPE declaration: 1

    It also seems the source code snippets are not actually taken from the source. I don’t use meta elements for example. It’s also too bad no media types are noted down.

    Looking at this data, it seems obvious everyone using XHTML 1.0 Transitional or no DOCTYPE at all should update their doctype assuming their data is being send as text/html. The same goes for HTML 4.01 Transitional of course. (Especially for CSS experts!)

    By the way, I carefully used code markup in my comment, but it doesn’t show in preview…

  3. This is really one of those issues that makes us all pull against each other in a negative effect. Some debates are intirely circular and I think this is one - the nays and yays are in their trenches already.

    Serving xhtml strict as text/html I have in the past been accosted quite vehemently by html advocates stating I’m breaking the web, that I’m somehow even somewhat subhuman for considering such a thing. Expletives abound.

    I’m more concerned about the number of sites on the web with no doctype, invalid markup, and dodgey cms output they don’t care about. The one thing I do begrudge standardistas about is this recurrent obsession with html vs xhtml.

    In the end I have similar views to you Roger. Is it a strict doctype (if possible) and does it validate? If not then the key is education I guess. Just my 2 cents of course.

  4. In my experience, people who try to defend the goodness of HTML over XHTML have found themselves too lazy to implement it properly.

  5. June 19, 2006 by Roger Johansson (Author comment)

    Anne: Thanks for checking the numbers. I didn’t take the time to do so. I’m also very sorry about the markup problem. I am trying to improve the situation.

    nortypig: Yep, it’s rare to see anyone change their minds about HTML vs. XHTML. I think Sean’s article offers an interesting overview of how many use what though.

    Woolly: That is probably correct in some cases, but I think problematic CMSs that cannot be trusted to output well-formed markup are a huge factor. After all every developer can’t be expected to build their own CMS.

  6. June 19, 2006 by Peter

    This article is really good: http://schneegans.de/web/xhtml/. It’s in German, but maybe the Google translator could help.

  7. To me the HTML vs. XHTML (served as text/html) isn’t what matters - in most cases. I write my XHTML with real XHTML in mind…
    xhtml vs. garbage
    …and have had few problems with that approach so far.

    Future compatibility-problems isn’t a real problem for web designers. It may however become a problem for browser-makers, and users who are unable, or unvilling, to upgrade.

  8. June 19, 2006 by Colm

    I think there are much more important issues than whether to use XHTML or HTML - namely to get browser vendors to agree on how things render onscreen.

    Also HTML 5? There isn’t HTML 5 afaik - is someone just making it up to please themselves?

  9. Bah, if you’re writing XHTML, then serve it as XHTML. That way, when you have bad markup in your page, the parser stops and you get a big fat error on the screen. Keeps you honest!

    Though even that doesn’t ensure real XHTML b/c the parser won’t die if you write non XHTML tags, only if you break XML’s rules.

    As far as I’m concerned, serving pages in application/xhtml+xml lets people put their money where their mouth is. You say you’re markup is valid? Prove it! :-)

    (meanwhile, my pages don’t validated b/c my stupid document.write() override script requires IDs in my script tags… gotta fix that soon)

  10. Colm: Web Applications 1.0 is where you can find HTML 5/XHTML 5.

    Anne: The code snippets were taken from the actual source code. When I was collecting data for Why XHTML? I set your meta;elements as “None” (which should have been noted as “None declared in the source code because it is a redundant declaration since they are already specified server-side”).

    nortypig: I agree that “the nays and yays are in their trenches already.” Though, The Nays and The Yays are established web developers. Still, a few have gone from XHTML/Strict to HTML/Strict.

    The purpose of that article was to illustrate that even in the the Web Standards community there is not a single standard for beginners, i.e., New Professionals, to follow. I chose fifty sites from those whom I admire, respect and/or follow for their expertise. The article was begun in late 2005 when various articles on those fifty sites addressed “Old Professionals and New Professionals”, “HTML or XHTML”, “Strict v. Transitional” &c. I offered - for The New Professionals - “Monkey whomsoever you’d like.”

    My article CSS Reboot as Web Standards Validation Indicator illustrates that - Regardless - of what Standardistas say most web developers new to web standards will follow and copy, i.e., Monkey, established elite sites. Sadly, two-thirds of the time they will do it incorrectly.

  11. June 19, 2006 by Scott

    I read an article somewhere a few months ago which said that using HTML 4.01 instead of XHTML was hypocritical. How can a web designer or developer consider themselves to be forward-thinking if they continue to write code in the past?

    IMHO using XHTML forces you to write better code. That’s why I use it and will continue to use it for the foreseeable future. I think it’s a good reason for others to use XHTML too. This isn’t the first time I’ve seen somebody say that XHTML 1.0 and 1.1 won’t be compatible with XHTML 2.0, but I’ve never seen anybody remark on the even greater difference between HTML 4.01 and XHTML 2.0. Writing XHTML 1.0 or 1.1 now will make it easier to upgrade to XHTML 2.0 later. Let’s break those bad habits, that are valid in HTML but not XHTML, now while it’s not as critical.

  12. June 19, 2006 by Carlos Bernal

    Scott: I agree with you and may I add that even today we still have people making the case for not using css for website layouts!

    By saying that we should code HTML strict and not XHTML strict because of IE is like saying we should only use CSS-1 not CSS-2 for the same reason.

    To move browser support forward we must code ahead. I think this is just another case of early adopters vs laggards…that’s all!

  13. Over on the SitePoint forums there is some great input given by a user called AutisticCuckoo regarding this same debate:

    http://www.sitepoint.com/forums/showthread.php?t=327908

  14. “Sending XHTML as text/html Considered Harmful” by Ian Hickson:

    http://www.hixie.ch/advocacy/xhtml

  15. Personally I think XHTML is better, but as long as your markup validates (without warnings as well!) you are one step ahead of the rest. Also, be sure you actually know what XHTML is: http://www.webdevout.net/articles/bewareofxhtml.php

  16. June 20, 2006 by Tommy Olsson

    I was going to link to my new FAQ at SitePoint, but Darren Hoyt beat me to it. :)

    Whether you use HTML or XHTML markup isn’t all that important. Using a Strict doctype is far more relevant for ‘future-proofing’.

    However, if you use XHTML markup and serve it as text/html, make sure that it still works when served as application/xhtml+xml. If you’re relying on HTML-only features, you definitely should not be using XHTML markup.

    As long as you comply with (the non-normative) Appendix C of the XHTML 1.0 specification, and are aware of the fundamental differences between HTML and X(HT)ML, you shouldn’t run into any problems.

    @Kevin Boyer: validation is good, but the big differences between HTML and XHTML won’t be picked up by any validator. ;)

  17. Summary
    1. An XHTML DOCTYPE doesn’t make browsers process your document using XHTML rules. Only Content-Type can do that and only in browsers which support it (many don’t).
    2. XHTML 1.0 compatible with Appendix C is limited to the elements, attributes and techniques of HTML 4.01.
    3. XHTML 1.1 must not be sent with a Content-Type of text/html because it is not compatible with HTML rules.
    4. HTML 4.01 is always more efficient than the equivalent XHTML.
    5. HTML does everything XHTML 1.0 compatible with Appendix C can do, yet is more efficient.
    6. HTML is the better format for use in text/html documents.
    Content-Type

    The key to this is the Content-Type header being used.

    When a server sends a file to a browser, it first sends a few lines of text explaining what it is about to send. These lines of text are called the HTTP Response Headers.

    The HTTP Response Headers for this page are:

    Transfer-Encoding: chunked
    Date: Tue, 20 Jun 2006 11:02:23 GMT
    Content-Type: text/html; charset=iso-8859-1
    Server: Apache/2.2.0
    X-Powered-By: PHP/5.1.2
    Vary: Accept,User-Agent
    
    200 OK
    

    You can set these up using your server configuration files. For Apache, these include the httpd.conf and .htaccess files. For example, to make the server include a Content-Type header for all HTML files (.htm or .html), you’d do something like this:

    AddType 'text/html; charset=utf-8' .htm .html
    

    This applies to all types of file. To send all Cascading Style Sheet (CSS) files (.css) with the correct Content-Type header, you’d use something like this:

    AddType 'text/css; charset=utf-8' .css
    

    For PNG images (.png) you’d use something like this:

    AddType 'image/png' .png
    

    For XHTML documents (.xhtml) you’d use something like this:

    AddType 'application/xhtml+xml' .xhtml
    

    The Content-Type header tells the browser what format the data they are about to receive is in. The browser decides how to handle the data according to this header.

    When the text/html header is used, the browser processes the document using the rules of HTML.

    When the application/xhtml+xml header is used, the +xml part means browsers which support it will process the document using XML rules. The /xhtml part means they can treat the elements as being part of the XHTML namespace (<p> means ‘paragraph’, etc) as standardised in RFC3023.

    DOCTYPES

    DOCTYPES do not make browsers switch from HTML rules to XHTML rules. Only Content-Type has this effect. If you send a document which uses XHTML markup but uses a Content-Type of text/html, browsers will attempt to process it using HTML rules.

    HTML is here to Stay

    Many devices do not support application/xhtml+xml, so you must provide a text/html version to make sure everyone can access your website. Your website will mainly be processed using HTML rules because most people are using devices which do not support the rules of XHTML.

    IE 7.0 will not support for the rules of XHTML, so HTML will remain the mainstream for some years. HTML browsers will be using the web indefinitely.

    Furthermore, HTML5 could become a more practical format for commercial use than XHTML 2. This means that XHTML rules may never become the mainstream. Instead, the HTML rules may simply be developed every few years in new versions of HTML much like it was during the 1990’s.

    HTML is more Efficient

    When you write an XHTML 1.0 page compatible with Appendix C and send it as text/html, your markup is processed using HTML rules. This means your pages have a fair amount of needless baggage:

    • Slashes in <img> and <meta> tags are treated as invalid attributes or as garbage characters.
    • Your DOCTYPE is little longer than the HTML equivalent.
    • You have an xmlns attribute adding filesize. This attribute is not valid HTML, so it is ignored when sent in a text/html document.
    • You have xml:lang as well as lang. All of the xml:lang attributes are redundant since HTML browsers will use the lang attribute.
    • You have lots of tags which are not required (especially end tags).

    In the HTML4 Elements Table, you can see that the Start Tag and End Tag of many elements are Optional. Optional tags have been allowed in HTML since HTML 2.0 and are a fundamental part of the language, so they are safe to use.

    On pages with many paragraphs, tables or lists these add up to be significant. Around 5% of filesize can be sometimes be saved by using HTML and removing the optional end tags.

  18. I think that is not necessary to take so the serious this of HTML be the correct one for markups and not XHTML.

    I believe that certain quarrels do not have the felt minor, since XHTML also is prepared to work as HTML.

    What the problem to use XHTML? Using a strict doctype one page can have the same validation markup, being HTML, XHTML 1.0 or XHTML 1.1, exactly XHTML 1.1 not being appropriate for the form text/HTML.

    Each professional uses doctype that to find better. All these “celebrities” be would missed for using XHTML in your markups? Nope! To use HTML or XHTML does not differ in nothing. The XHTML alone has some resources adds, as the MathML, SMIL, or SVG. Exists some problem to do use XHTML exactly that I do not use these resources? Nope!

    I believe that we must basing them on the quality of the marking and the presentation of our projects and not being judging the forms of use of doctypes. This does not go to modify in nothing the objectives that the project will have with the target public.

    I’m sorry for my bad english.

  19. June 21, 2006 by Wayne Godfrey

    Argggg!! This is where I have to agree with Sean, “where’s the help for beginners?” Two years ago I couldn’t read, write nor did I even have a clue what HTML was, never mind XHTML. When I first made my early attempts, tables kept getting in my way. I figured there had to be a better way and found standards. Thankfully I never learned bad coding habits. That was until I decided to go back to HTML 4.01 Strict and found my original XHTML techniques don’t always validate. This becomes frustrating to us who are trying to learn and learn correctly. I love what I’m doing and I’m greatful for all the help I’ve received from so many willing to take the time to teach, but somewhere along the line here, in order to have web standards, we need some standardization!

  20. June 21, 2006 by Tommy Olsson

    @Ben: Good summary, but there’s one error in what you wrote. The Content-Type header is only used (by browsers) to choose which parser to use (XML or ‘SGML’). That header does not make an XHTML document XHTML, only XML (despite the ‘xhtml’ in the MIME media subtype).

    The thing that says that an XHTML document is really XHTML is the ‘xmlns’ attribute, with the correct value, on the root element. Of course, that’s ignored for non-XML documents, so it takes a combination of Content-Type: application/xhtml+xml and the proper ‘xmlns’ attribute.

    You can use application/xml, or even text/xml, as the Content-Type and still have the document recognised as XHTML, provided that you have the correct ‘xmlns’ attribute.

  21. Thanks for the clarification, Tommy. :)

    My kindom for an ‘Edit’ button!

    Actually, I could neaten it up and make an article out of it…

  22. I honestly don’t get the point of bickering about which DOCTYPE is the shit … it is just tearing the community apart. Creating tension where we should be united against the non-standard conforming n00bs.

    I meen doesn’t it just boil down to a matter of taste?

  23. It doesn’t boil down to taste (if only it were that simple!). Markup gets processed by user agents according to certain rules. Educating developers about what those rules are and how they work in reality is a helpful thing to do. People can then make an informed choice about what markup they use.

  24. Ben ‘Cerbera’ Millard says that “may browsers don’t.” What he really means is that Internet Explorer doesn’t. Personally I don’t think that’s a big issue though. There are more important things to fix and worry about.

  25. XHTML is forward compatible even if XHTML 2.0 is not backwards. XSLT does the trick. You create a transformation document to convert it on the fly. Since HTML is not XML, you can’t do tranformations as easily.

  26. HTML is also forward compatible. Almost none of the web uses conforming XHTML which can be processed using XML rules. There simply isn’t enough XHTML in use to browse that and nothing else.

    Now think about how much content is on the web which will never be converted. It would simply cost too much to convert the whole web. No mainstream UA could afford to abandon HTML support because their users would miss all the HTML content (which is nearly all of it at the moment).

    These (and other) realities makes HTML a format which is safe in the future. Future versions of HTML can continue to develop its capabilities just as previous versions did.

  27. Since I learnt about standards I have used xhtml but after reading this and some other information I think I may have made the wrong choice. I think most people will continue to use xhtml though because its very discouraging to begin designing to web standards only to be told your still doing it wrong.

  28. August 13, 2006 by nieruby

    html is the best

  29. Inline SVG (supported by Mozilla/Firefox, Opera and WebKit) and inline MathML seem to me to be pretty good use-cases for XHTML.

    So is being able to embed XHTML content (instead of an opaque CDATA blob) in an Atom feed. (On an unrelated note, that’s one reason not to use feedburner.)

    But, if there’s nothing newfangled about your content, and if you don’t use syndication, then HTML4 will do just as well.

    [Just to forestall the obvious comment. Yes, like Anne van Kesteren, if you have well-formed XHTML on the back-end, you can both embed that XHTML in your Atom feed and convert it to HTML4 for browsers. But there’s no particular benefit to the latter conversion, so why bother?]

Comments are disabled for this post (read why), but if you have spotted an error or have additional info that you think should be in this post, feel free to contact me.