Typography, semantics and document structure

In Semantic Typography: Bridging the XHTML gap, Mark Boulton explains how to turn a visual design into a semantically marked-up document by taking a real-world example and breaking it down into structural elements. Mark’s article is a good introduction to semantics and document structure for anyone who hasn’t already started thinking structurally when marking up documents.

For some reason, the whole concept of document structure and actually using structural elements like headings and lists seems alien to most content authors I have met. And if authors aren’t semantically structuring documents in their word processors, they sure won’t do it before publishing their documents on the web.

I’ve been thinking a bit about why most people don’t structure their word processing documents properly, and I’m not sure why that feature is so underused. It could be a usability problem with word processors though – it’s often less obvious how to use styles properly than how to change the font and text size.

Posted on November 26, 2005 in Quicklinks, Typography, Writing


  1. November 26, 2005 by Juan Carlos

    Yeah, I wonder if it is because the B, I and U buttons typically visible on a word processor’s toolbar are too easy to use in comparison to style selectors.

    Word 2003 (can’t remember about Word XP) and Open Office 2 allow you to see the list of styles more easily. (In Word you see it on the right hand side of the window. In Open Office, you either have a floating window or you can dock it where you want.)

    If you have a big enough (or wide enough?) screen, then this seems fine — if you know it exists, and if you know how to turn it on. Most users perhaps do not know, and so bold and italics win over other options such as headings, emphasis, citations, terms and so on… Heck, Word even has the HTML Cite and HTML Abbreviation styles… unfortunately, do you know anyone who uses those?

    Someone once had a good suggestion — teach proper usage at high schools!

  2. November 26, 2005 by zcorpan

    The goal is for the content to look right. How they do it is less relevant. Therefore saying that this text should be big and bold is easier than saying that it’s a heading. For printed media, that might be fine, but on the web it is not because it is not a visual medium.

  3. November 26, 2005 by Masklinn

    The goal is for the content to look right. How they do it is less relevant. Therefore saying that this text should be big and bold is easier than saying that it’s a heading.

    That’s extremely wrong. Anyone who’s had create reports of above 50 pages and had to change the style of the document through the writing (cause it sucked, or for corporate reasons) knows that the “how they do it” is extremely important. And that’s part of the reason why most academics still use TeX/LaTeX: write what you mean first, then see the visual style issue.

  4. November 26, 2005 by gerben

    Maybe it’s because M$ Word just messes around with the styles-classes. (Note that word doesn’t have symantics like HTML. It just has a group of predefined classes). Especialy every word version upto (and including) 2000. I once managed to get three different H1’s which in the end complete messed up my TOC. It took a lot of skill to get word to work properly

    Ik almost thing HTML is easier to learn than Word.

  5. November 26, 2005 by Martin Smales

    I’ve had my fair share of marking up documents semantically because things like list structures for menu items or to-do lists - I ended up using “dt”, “dl”, or “ol”, “li” and so on.

    And the overused use of “blockquote” for indenting which its initial purpose is to “quote” something, and don’t get me started on “address” which is even more confusing…

    I, for one, will prefer an HTML “dictionary” listing each element and for each, what these are for… so everyone will be on the same page in terms of semantics rather than leaving the whole thing wide for massive interpretation especially for people who call themselves professionals.

  6. November 26, 2005 by Roger Johansson (Author comment)

    Masklinn: I believe zcorpan was referring to the (incorrect) approach many people use.

    Martin: A semantic dictionary is a very good idea that could help us avoid a lot of confusion and arguing.

  7. I think the problem is that MS Word simply makes it way too easy to bypass the document structure features that are available in their software (see comment above re: B I U buttons on the toolbar). Users aren’t forced in any way to even consider using proper headings, etc…

    Having spent time as a technical writer, I can understand the importance of document structure and semantics when you need to publish a document to paper, a Word document and a web-based product like RoboHelp… all from the same source document. I can tell you that most professional technical writers certainly do give a lot of thought to semantics and document structure. Unfortunately, most others that I’ve worked with don’t understand the importance of a well structured document.

    I think this may be the reason why the idea of structured documents in web design just makes sense to me.

  8. Most people don’t know how to write a structured print document with standard Beginning-Middle-End rules. [Or, a research paper structured with Introduction/Abstract-Research/Analysis-Conclusion/Recommendation rules.]

    Composing semantically-structured HTML is no different.

    Most “website builders” don’t care about semantic rules nor do they comprehend the need for structural mark-up. The visual design they copy from a semantically-structured site’s identical; why bother.

    Most “Web Professionals” know semantic rules and, consequently, they are able to bend them so as to still remain in web standards. Professionals know rules and how they can be bent or broken; amatuers don’t.

    All professions have this dynamic.

  9. It’s testament to bad training that structured documents are not the norm. The functionality to make structured documents has been built-in to MS Word since about Word 95 (perhaps even earlier…). It’s called Outline View (View - Outline).

    Outline View lets you use heading levels to control the structure of the document, then worry about how it looks later. Headings and sections can easily be promoted/demoted within the structure by using a few keystrokes. The styles can then be applied at the style level and done really quickly.

    To this day, I have never met ANYONE who knows about this feature and how it can speed up document creation. However, when I show people how to use it, most of them change the way they work.

    Most word processors have similar functionality, it’s just that people largely don’t think of documents beyond, say, a typed letter. When it gets down to it, most people approach a word processor as a glorified typewriter.

    Seems that old habits die hard.

  10. Steve D: and you are wondering why? Because MS is plain stupid hiding the outline feature somewhere in deep menus and offering B/I/U/font size instead.

    There should not be any font-size at first place. Just offer headings and normal text and in the end before saving or printing ask user to style those (e.g. just “please, choose fint size used for headings 1, sub headings 2 etc.).

  11. it’s often less obvious how to use styles properly than how to change the font and text size.

    I think that’s exactly what the problem is. In the version of Word I’ve got it’s a pain in the backside to use styles.

    That said, we convert a lot of MS Word documents to PDF format for use on the web. If you don’t make use of heading styles, the PDFMaker toolbar can’t correctly set up bookmarking. So we trained the content authors to use headings properly and now all is well in the world of PDFs.

    So education is probably one way forward here. Or a Word Processor with a “Styles” toolbar instead of the traditional formatting tools…

  12. Most people ignore the “Styles and Formatting” option because they don’t understand templates.

  13. When the GUI came in, computers software was made to act a bit more like other objects that people already knew: file folders, filing cabinets, etc.

    Outside of computers, I’m not aware of any document structure tools. It’s a pretty computer-specific thing. So for people who aren’t familiar with computers, it’ll always be another thing to learn.

  14. November 28, 2005 by gerben

    So education is probably one way forward here.

    Good point by Olly. I taught my father to use headings by just pointing him to the style-dropdown-box and giving him a reason to use it by pointing him to the TOC function. It took me like 10 minutes and he’s using headings ever since and is excited about them.

    It’s however difficult to get someone to want to be educated since they (think they) can already do everything they need with Word.

  15. Martin: Comment taken. I’ll work on it.

  16. November 29, 2005 by Zoe Gillenwater

    I completely agree that content creators not marking their documents up is simply a matter of education. To that end, I intend to put together a short tutorial for my colleagues to show them how to use the Styles and Formatting panel and Outline View in Word. Does anyone know of something like this already in existence that I can start from?

  17. I have to agree with SteveD here - bad training, among other things. I’ve had to write reports, white papers, manuals, and all manner of other documents that required the proper use of document structure.

    I’ve watched countless people create and work on documents and most people simply do not need to be bothered with a word processor’s numerous structural styles. The majority of users don’t create documents that need a table of contents, an index or even something as simple as an outline. When they DO need such a document, they’re mostly clueless as to how to make that happen. I spent enough years in tech support having to deal with that scenario and having to ‘correct’ someone else’s document mess because ‘the TOC isn’t working’. Most people just sit down and start typing with no thought for structure at all.

    Training, from the ground up, is necessary to teach users how a document is properly formatted. If it’s taught in school at all, I’d be surprised. Other than courses to be a technical writer or journalism major or other similarly focused occupation, I think its largely ignored. To then expect that typical web publishers will do anything differently is, at this point, expecting too much. They’ll all just go the font size and bold/underline/italics route.

Comments are disabled for this post (read why), but if you have spotted an error or have additional info that you think should be in this post, feel free to contact me.