Browsers will treat all versions of HTML as HTML 5

After joining the W3C HTML Working Group a few weeks ago I’ve been spending some time every day reading the discussion on the public-html mailing list. I have also been reading the WHATWG HTML 5 Working Draft to get a better understanding of what everybody is talking about on the list. It can be quite hard at times.

One thing that I simply could not understand at first was the proposed design principle once called “Don’t Break The Web” and later renamed to “Support Existing Content”, which is often referred to. I found it really confusing. How could anything break just because a new version of HTML is published? Browsers would still treat old content the way they do now, right?

Or so I thought, but that is not the plan. From the Conformance requirements section of the WHATWG HTML 5 specification (which has been proposed as a starting point for the new W3C HTML specification):

Web browsers that support HTML must process documents labelled as text/html as described in this specification, so that users can interact with them.

So, as soon as a Web browser claims to support HTML 5, it is required to treat all content served as text/html (which means all HTML and nearly all XHTML) as HTML 5.

I’m not sure if I think that is actually a good thing or not, but it does explain what “Don’t Break The Web” means.

Anyway, I hope that helps clear things up if anyone else was confused by this.

Posted on May 2, 2007 in (X)HTML, Browsers, HTML 5

Comments

  1. That’s all well and good, but who is going to do the enforcing? They can make requirements all day long, but it doesn’t mean Microsoft / Firefox / Opera are going to follow them at all.

  2. I don’t expect them to follow this. Sounds like quirk’s mode. How do they think will this new html-standard integrate in existing www? I think it would be much more useful, if - any website without dtd will render as quirk - any website with dtd html4 / xhtml1 will render as html4 / xhtml1 - any website that claims to be html5 must have a html5-dtd and must be valid. otherwise: fallback.

    Stefan

  3. IE is the only browser which uses more than one HTML parser (two versions of Trident are included — one for quirks, one for standards mode). The rest use a single HTML parser. The aim is to document how to parse HTML as it is in the real world (therefore not referencing SGML). So, by documenting what browsers already do, it is highly probably that it’ll be implemented (the Moz. Foundation, Opera, and Apple have all said they’ll implement it as their sole HTML parser. MS have said that they’ll use it for HTML5-mode documents).

  4. If I remember correctly, HTML5 doesn’t require the DTD line, so browsers have to fall back to something, and they’re saying it should fall back to HTML5. I would think that if a DTD line is present, it would interpret as whatever the DTD line specified. Is this correct?

  5. Actually, I think this makes perfect sense. This is how all incremental additions to the HTML standard has been made, and thanks to HTML’s description of what user agents should do with tags they don’t understand (render the contents), it’s quite easy to design around. Any other approach (requiring two separate rendering modes) would have caused a lot of bloat, gnashing of teeth and might have ended with people having to serve two different sets of content depending on the support of the user agent.

  6. It’s scary until you realize that the goal of HTML 5 is to be compatible with existing, current implementations, and goes to great lengths to describe current implementation behavior. That is, HTML 5 is backward compatible. The problem is when two existing browsers implement something different. A choice has to be made of which way to describe the implementation. I don’t know how they are handling that at this point. However, on the Alt attribute tool-tip issue, they seemed to have sided with non-IE implementations.

    @mgroves

    That’s all well and good, but who is going to do the enforcing? They can make requirements all day long, but it doesn’t mean Microsoft / Firefox / Opera are going to follow them at all.

    The goal of an implementation is to implement the spec exactly (and it wouldn’t be an implementation of a spec if it wasn’t written to the spec). Previous versions of HTML were ambiguous on how to implement them. WHATWG’s HTML 5 is pretty explicit on how things should be rendered since they’ve looked into how each browser renders and tried to describe that behavior.

    So, no one will police it, just as no one polices it now. However, it should be known that as staunch as Chris Wilson is about not breaking IE’s existing bad behavior, he seems to genuinely regret having a broken implementation and not being able to fix it without another switch.

    If you think IE is evil, know that Chris Wilson is certainly not.

    @CM Harrington

    If I remember correctly, HTML5 doesn’t require the DTD line

    From the Web-apps 1.0 spec:

    A DOCTYPE is a mostly useless, but required, header.

    So, the header is required for HTML 5 served as text/html (though it is not for the XHTML serialization because there are no quirksmode implementations of real XHTML). The DOCTYPE is required to avoid quirksmode. Browsers will still fallback to quirksmode if it is not present, unless I missed something.

    Unfortunately, browsers don’t care about what version you specify in the DTD except to decide whether to render in quirksmode or not. Mozilla, for example, will render HTML 3 in quirksmode. I’m under the impression that IE merely looks for a properly formed DOCTYPE, ignoring versions completely. The reason I say it is unfortunate is that it gives WHATWG enough reason to leave out versioning in HTML 5 completely. In fact, most of the WHATWG people are staunchly against versioning. This is causing problems for the IE team that need a new switch, and for people like me that want to say “this was the version I was authoring for; I made my best attempt to confirm to this specification, even if it isn’t valid HTML 9.”

    Geoffrey Sneddon is right that we all get one parser except in IE. So, from what I can tell, all standards-mode text/html is already being parsed as HTML 4.01. By agreeing to use only the HTML 5 parsers, we’re doing the same thing we are now, except that the new HTML 5 tags will be included. Ideologically, this is good because HTML 5 does a better job at describing how elements are rendered than HTML 4 did.

  7. I came across this link http://www.w3.org/QA/2007/05/htmlandversion_mechanisms.html yesterday and left it in my delicious account with the comment: “this could get ugly”.

  8. “Browsers will treat all versions of HTML as HTML 5”

    Just like current browsers treat all versions of HTML as HTML 4. So long as the next generation of HTML is an evolution rather than a revolution, it’ll work just fine.

    If they rip up the old spec like they did with XHTML2, it’s gonna end in tears.

  9. May 3, 2007 by zcorpan

    While it is intended that a UA implementing the spec would pretty much be able to work with the real Web, strictly speaking it also allows browsers to keep their current code for documents that don’t start with the new doctype. See the “The initial phase” section in the Parsing section.

    However, AIUI, Mozilla, Opera and Apple are indeed interested in implementing the spec for all text/html content, regardless of doctype.

  10. OK, we just had a conversation about this with a few colleagues, and as it’s 2:30am I’ll keep it short: to me this is rubbish.

    A version is simply a label for a specification in a particular state.

    If you never give a particular version to a spec, or never implement separate versions of a spec, you only end up with a steaming pile of rules to follow, which accumulate over time and get more or less implemented on a random basis. I keep telling myself there must be a good reason behind this idea, but it honestly seems to be defying any form of logic or practicality.

  11. Anyone who thinks existing content is treated as HTML 4.01, and that only HTML5 content should be treated as HTML5, seem to be ignoring the fact that browsers do not treat anything as HTML 4.01. They don’t implement HTML 4.01. They just happen to support a language that is syntactically similar and shares some of the same vocabularly, yet that langauge is technically undefined.

    That is one of the issues that HTML5 is trying to address. There is no other spec that defines the language and processing requirements to handle existing content, but one is desperately needed to achieve interoperability.

    At present, entering the browser market is an extremely costly and time consuming process that involves reverse engineering existing browsers. It’s been 3 years since the WHATWG started to reverse engineer and document browsers, and we’re not even close to finished!

    There have been many who have argued that HTML4 should basically be left to rot, and that going forward new content should be parsed and rendered in a more standards compliant mode. But everyone, including those people, would be screaming the minute browser vendors decided it was too costly to maintain support for todays content, and only support HTML5.

    Ignoring the problems doesn’t make them go away, it only lets the grow and grow until they become unmanageable. It is absolutely vital that we document and standardise the requirements for handling todays content, so that we can in fact fix the problems and move forward.

  12. Good point Roger!

    [Hey, some good other comments made prior to mine.]

    Well, in some ways - it sounds great: bringing everything up to HTML 5 standard might force a better set of markup standard across the web. That’s an all-round improvement right?

    On the negative side though: sites that fail to meet the code requirement could really struggle online…all that previous hard-work being undone and leaving clients/customers/web site owners/site visitors a bit disappointed. Anything that damages public confidence in the web is a bad thing in a short-term view.

    On the plus side: this could keep web designers in work for years to come as we try to re-design the older-markup sites out there!!! :D

    Overall though - I have to agree with some of the other comments made: I just don’t think this is going to have as much of a big impact on existing (and future) web sites as we might initially expect. So we won’t need to panic too much…yet. I suspect HTML5’s progress on to the web will still be a dominant web-industry topic for the coming year or two….(with more articles to come from you right?!) :)

  13. The benefit of quirks mode in IE is that you can develop a site in it and be confident that a future bugfix isn’t going to break your site, because they leave most of the bugs in.

    This sounds like a bad thing but it really isn’t.

    For freelancers, changes in browser behaviour can be a good thing as it means more money to make a site work in a new browser, but for companies with a large web presence their developers don’t want to spend their time fixing sites because a new version of IE came out which fixes some but not all bugs.

    I know the call is for IE to just “get it right this time”, but they won’t, and neither will other vendors. All browsers have bugs. The idea of versioning html documents gives some guarantee that if your site is semantically coded and works in version x of current browsers (including any bug-workarounds), it’ll work in version x+1 too.

    Microsoft broke sites when they improved standards compliance in IE7, they’ve learned from that mistake and won’t be making it again. They’ll implement versioning with or without the support of W3.

  14. Html versioning will never ever give you any kind of guarantee. You probably need to address browser version instead (well, for MS at least).

  15. Incidentally, it also explains how the current web works.

  16. Others have said it, but it bears re-iterating: one of the primary purposes of HTML 5 is to document exactly how browsers work now. So the requirement for HTML 5-compliant browsers to treat all documents as HTML 5 is really saying “Treat documents the way they are already treated by the browsers everybody currently uses.”

    Back in the days when Netscape 3 and IE 3 were the browsers to worry about there were an assortment of cases where a document would be rendered differently in each; Netscape 4 and IE 4 just muddied the waters even further, and the primary reason was that there was no specification for the browser manufacturers to follow in handling cases that were ambiguously specified, or where markup was malformed.

    The rationale behind treating everything as to HTML 5 is that all the existing content out there - including the malformed markup that is the vast majority of the web - will be handled consistently and as it is currently by compliant browsers. If HTML 5 causes existing content to break, then that will be evidence of a flaw in the specification. Conversely, if a new browser causes existing content to break, then that will be evidence that the creators of that browser didn’t follow the specification, and they can be exhorted to get into step by reference to a clear description of how they should have made their browser behave.

  17. May 4, 2007 by Court Kizer

    Congradulations on the new position. However it sounds suspiciously like the people you work with don’t browse the web.

    Could you please check under their desks to make sure the internet is plugged in? And if so can you check to make sure they have web browsers?

    I see early retirement in their future, or people flat out ignoring their work.

  18. May 4, 2007 by Wulf

    There’s an article over at Web Devout covering some more of the “scary stuff… going on in HTML 5 development”: “The whimzical world of HTML 5”.

  19. Thanks Roger, that’s good info. I joined the list a few weeks ago as well and I’ve been struggling to stay on top of the hundreds/thousands of emails that come in. I’ve appreciated your responses on the list.

    What’s most interesting about it all to me is that it’s very political. That probably shouldn’t have been a surprise, and I don’t say it with the attitude (that normally comes with calling something political) of “aw forget it, I’m outta here” but more to make the point that it’s a different kind of game. Tone is just as important as the argument. I hope to get involved more, but like you said it’s hard when there’s so much obscure information being referred to.

    And I keep trying to figure out the underlying factors and arguments behind posts and that’s not always easy with email and response after response after response…

  20. I think Lachlan’s answer is enlightening: it is a pure commercial choice, at the end. None of the vendors involved would pay the cost of developing a double parser just for making developers’ life easier.

    Thanks Roger, I was curious about it, too, but it’s getting tough extracting clear informations from the public list.

  21. May 4, 2007 by Roger Johansson (Author comment)

    Andrea:

    None of the vendors involved would pay the cost of developing a double parser just for making developers’ life easier.

    Judging from their comments on the public-html mailing list, some of the browser vendors have no interest whatsoever in doing anything to make the lives of developers easier. They only care about market share, market share, and pandering to people who can’t be bothered to learn how to write HTML properly.

  22. But wouldn’t that actually break a web? We know about the support (or lack of thereof) and browser features to handle the source (in)correctly.

    That rule can mean that lots of websites would get broken.

  23. It will break the web, but we will survive.

  24. May 5, 2007 by edbm

    Not having followed the HTML5 work at all, to me this sounds simply like a design requirement for the new version of the standard to be fully backwards compatible with 4.01, which among other things means that it has to be - semantically and syntactically - a superset of 4.01.

    One consequence of this is that every document valid under one of the HTML 4.01 DTDs has to be valid also under any of the HTML 5 DTDs (syntax), and also (ideally) render identically in a HTML4 browser and a browser supporting HTML 5 (semantics). The fact that it makes it easier for browser implementors to conform to is a benefit but not necessarily the only reason for this design decision.

  25. Even if things do happen this way, just conform to new standards (as will have to search engines and everything else), all new reliable content will have to be put onto the web in new HTML 5 and archived content just, well, die. This will leave room for hundreds of thousands of keywords to be overridden by blackhat SEO within the first few days of the new standards. Large profits for the right people!

    Just a thought of what we can expect, everyone always wants to fix what’s not broken.

  26. Um, what? I’m not even sure what you’re really trying to say sholiz. How is it that black hat SEO will have any more influence than it does over ruining the Web if HTML 5 is implemented and adopted? Why exactly will archiving be endangered?

  27. Um - why am I declaring all these doctypes then if browsers would ignore them anyway? Why do I specify that my document is a XHTML 1.0 Strict or a HTML 4.01 if the browser should go on and render it as HTML 5 anyway?

    I don’t like the tone with which it expressed - claiming some sort of superiority to the other HTML-standards…

  28. May 8, 2007 by Jacob

    Pelle: that’s what they currently do. So why are you declaring all those doctypes today?

  29. May 9, 2007 by Brian

    Jacob, that is precisely Pelle’s question. I am wondering the same thing. What was the intended purpose of Doctype declarations with versions?

    I’m discussing this with my fellow HTMLers here at work and we have each experienced pages behaving differently when the doctype version changes.

    I guess it’s time to do some googling on this topic. I want to see some examples that prove Doctype versions are ignored.

  30. Would be great if we can have a sneak peak into whats exactly new in HTML 5.0

Comments are disabled for this post (read why), but if you have spotted an error or have additional info that you think should be in this post, feel free to contact me.