Contents

  1. Introduction
  2. History
  3. Web Standards
  4. Structure and presentation
  5. (X)HTML
  6. CSS
  7. Accessibility
  8. URLs
  9. References
  10. Glossary

5. (X)HTML

XHTML 1.0 is a reformulation of HTML 4 in XML 1.0 and was developed to replace HTML. However, there is nothing preventing you from using HTML 4.01 to build modern, standards compliant, and accessible websites. Whether you use HTML 4.01 or XHTML 1.0 doesn’t really matter all that much.

What is more important is to properly separate structure from presentation. Strict doctypes allow less presentational markup and enforce separation of structure from presentation, so I recommend using HTML 4.01 Strict or XHTML 1.0 Strict.

XHTML 1.1, which is the latest version of XHTML, is technically a bit more complicated to use, since the specification states that XHTML 1.1 documents should have the MIME type application/xhtml+xml, and should not be served as text/html. It isn’t strictly forbidden to use text/html, but it is not recommended.

XHTML 1.0 on the other hand, which should use application/xhtml+xml, may also use the MIME type text/html, if it is HTML compatible. The W3C Note XHTML Media Types contains an overview of MIME types that are recommended by the W3C.

Unfortunately some older web browsers, and Internet Explorer, do not recognize the MIME type application/xhtml+xml, and can end up displaying the source code or even refuse to display the document.

If you want to use application/xhtml+xml you should let the server check if the browser requesting a document can handle that MIME type, and in that case use it, and use text/html for other browsers.

This is called “content negotiation”. Instead of going into details about it here I refer you to the following writeups:

Note that when the MIME type is application/xhtml+xml, some browsers, for example Firefox, will not display documents that aren’t well-formed. This can be a good thing during development since it immediately makes you aware of some markup errors. However it may cause problems on a live site that gets updated by people who are not XHTML experts, unless you can ensure that all code stays well-formed. If you cannot guarantee well-formedness you should probably avoid content negotiation and use HTML 4.01 or “HTML compatible” XHTML 1.0 instead.

Differences between XHTML 1.0 Strict and HTML 4.01 Transitional

Here is a list of the things that are most important to consider when using XHTML 1.0 Strict instead of HTML 4.01 Transitional (or no-name, plain old invalid HTML):

Recommended rules for HTML 4.01

I recommend sticking to most of these rules even if you are writing HTML 4.01. Doing so makes the markup much easier to read and maintain, and has become a widely used convention. So when writing HTML 4.01:

Doctype

The doctype, or DTD (Document Type Declaration), used to be more decorative than functional, but for quite a few years now the presence of a doctype has been able to greatly affect the rendering of a document in a web browser.

All HTML and XHTML documents must have a doctype declaration to be valid. The doctype states what version of HTML or XHMTL is being used in the document. The doctype is used by the W3C markup validator when checking your document and by web browsers to determine which rendering mode to use.

If a correct and full doctype is present in a document, most web browsers will switch to standards mode, which among other things means that they will follow the CSS specification closer. This will reduce the difference in rendering between browsers.

The following doctypes will will make the web browsers that have “doctype switching” use their standards mode:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
        "http://www.w3.org/TR/html4/strict.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

For more detailed information about Doctypes, see my Opera Web Standards Curriculum article Choosing the right doctype for your HTML documents.

Character encoding

All XHTML documents should specify their character encoding. If you don’t, the browser will have to guess which character encoding to use. If it guesses wrong your visitors may have a hard time reading the text on your website.

The best way of specifying the character encoding is to configure the web server to send an HTTP content-type header with the character encoding. For detailed information on how to do this, check the documentation for the web server software you are using.

If you’re using Apache, you can specify the character encoding by adding one or more rules to your .htaccess file. For example, if all your files use utf-8, add this:

AddDefaultCharset utf-8

To specify a character encoding for files with a certain filename extension, use this:

AddCharset utf-8 .html

If your server lets you run PHP scripts, you can use the following to specify the character encoding:

<?php
    header("Content-Type: text/html; charset=utf-8");
?>

To serve your pages as XHTML, change text/html to application/xhtml+xml. If you, for whatever reason, are unable to configure your web server to specify the character encoding you are using properly, use a meta element as the first child of the document’s head element. It’s a good idea to specify the character encoding this way even if your server is configured correctly.

For example, the following meta element tells the browser that a document uses the ISO-8859-1 character encoding:

<meta http-equiv="content-type" content="text/html;
charset=ISO-8859-1">

Further reading

Comments, questions or suggestions? Please let me know.

© Copyright Roger Johansson