Understanding and extending semantics in HTML

In a series of three long articles (Part I—Traditional HTML Semantics, Part II—Standardizing Vocabularies, and Part III—Directions in HTML Semantics), John Allsopp expands his thoughts on how the HTML based Web can be improved to allow for better semantics.

After explaining what “semantics” actually means, John defines three different semantic classifications that an HTML element can belong to:

structural: Defines document structure. Examples: div, span, h1 – h6, ul, ol, dl, p.
content: Defines the type of content it marks up. Examples: abbr, address, code.
rhetorical: Defines rhetoric added by the author. Examples: em, strong.

The full list, which includes attributes, is available in Classifying the semantics of HTML. I haven’t seen the elements and attributes of HTML classified like this before, but it all makes sense to me.

The conclusion John comes to is (unless I am misunderstanding something) that extending HTML by adding new elements is a bad idea. Adding a new semantic element only works until there is a need for something else, at which point the loop starts again. And so on. Instead, John calls for a way of infinitely extending HTML, much like microformats.

He does have a point, though I’m not sure I agree with it. I’m not saying I disagree either, just that I don’t know. What do you think? Should new elements and attributes be added to the HTML specification when there is a need for them? Should there be another way of extending and improving the semantics of HTML without requiring the specification to be updated? Perhaps combining the two approaches would be better?

That’s a lot of questions. Anyone have answers?

Posted on August 31, 2007 in (X)HTML