Contents

  1. Introduction
  2. History
  3. Web Standards
  4. Structure and presentation
  5. (X)HTML
  6. CSS
  7. Accessibility
  8. URLs
  9. References
  10. Glossary

5. (X)HTML

XHTML 1.0 is a reformulation of HTML 4 in XML 1.0, and was developed to replace HTML. Note that there is nothing preventing you from using HTML 4.01 to build modern, structured, and standards compliant websites.

However, to make the transition to clean, semantic markup, and be better prepared for a possible transition to XML and other future markup languages, you may want to consider using XHTML 1.0 Strict. The choice is yours.

What matters more than if you use HTML or XHTML is that you use a Strict doctype and properly separate structure from presentation. Strict doctypes do not allow presentational markup, and enforce separation of structure from presentation.

XHTML 1.0 Strict is what is used in the examples in this document.

XHTML 1.1, which is the latest version of XHTML, is technically a bit more complicated to use, since the specification states that XHTML 1.1 documents should have the MIME type application/xhtml+xml, and should not be served as text/html. It isn’t strictly forbidden to use text/html, but it is not recommended. XHTML 1.0 on the other hand, which should use application/xhtml+xml, may also use the MIME type text/html, if it is HTML compatible. The W3C Note XHTML Media Types contains an overview of MIME types that are recommended by the W3C.

Unfortunately, some older web browsers, and Internet Explorer, do not recognize the MIME type application/xhtml+xml, and can end up displaying the source code, or even refuse to display the document.

If you want to use application/xhtml+xml, you should let the server check if the browser requesting a document can handle that MIME type, and in that case use it, and use text/html for other browsers.

If you’re using PHP for server side scripting, the following content negotiation script can be used to serve documents with different MIME types for different browsers:

<?php
if (stristr($_SERVER["HTTP_ACCEPT"], "application/xhtml+xml") || 
stristr($_SERVER["HTTP_USER_AGENT"],"W3C_Validator")) {
    header("Content-Type: application/xhtml+xml; charset=iso-8859-1");
    header("Vary: Accept");
    echo("<?xml version=\"1.0\" encoding=\"iso-8859-1\"?>\n");
    }
else {
    header("Content-Type: text/html; charset=iso-8859-1");
    header("Vary: Accept");
    }
?>

The script checks if the user agent sends an Accept HTTP header that contains the value “application/xhtml+xml”, or if the user agent is the W3C HTML Validator, which does not send a proper Accept HTTP header but still handles application/xhtml+xml. If either of those are true, the document is served as application/xhtml+xml. Those browsers are also sent an XML declaration. To other browsers, including all versions of Internet Explorer, the document is served as text/html. No XML declaration is added to the document, since that would put IE/Win into Quirks mode, which we don’t want.

After the Content-Type header, a Vary header is sent to tell intermediate caches, like proxy servers, that the content type of the document varies depending on the capabilities of the client which requests the document.

For a more advanced PHP content negotiation script, visit Serving up XHTML with the correct MIME type. That script takes the requesting user agent’s q-rating (how well it claims to handle a certain MIME type) into account, and converts XHTML to HTML 4 before sending the document as text/html to user agents that don’t handle application/xhtml+xml.

Here is a similar script for those who use ASP and VBScript:

<%
If InStr(Request.ServerVariables("HTTP_ACCEPT"), "application/xhtml+xml") > 0 
Or InStr(Request.ServerVariables("HTTP_USER_AGENT"), "W3C_Validator") > 0 Then
    Response.ContentType = "application/xhtml+xml"
    Response.Write("<?xml version=""1.0"" encoding=""iso-8859-1""?>" & VBCrLf);
Else
    Response.ContentType = "text/html"
End If
Response.Charset = "iso-8859-1"
%>

Note that when the MIME type is application/xhtml+xml, some browsers, for example Mozilla, will not display documents that contain errors. This can be a good thing during development, but may cause problems on a live site that gets updated by people who are not XHTML experts, unless you can ensure that all code stays valid. If that is the case, you may want to consider using HTML 4.01 Strict instead.

Here is a list of the things that are most important to consider when using XHTML 1.0 Strict instead of HTML 4.01 Transitional (or no-name, plain old invalid HTML):

Read more:

Doctype

Currently, very few HTML documents have a correct and full doctype, or DTD (Document Type Declaration). It used to be more decorative than functional, but starting a few years ago, the presence of a doctype can greatly affect the rendering of a document in a web browser.

All HTML and XHTML documents must have a doctype declaration to be valid. The doctype states what version of HTML or XHMTL is being used in the document, and is used by the validator when validating, and by web browsers to determine which rendering mode to use. If a correct and full doctype is present in a document, many web browsers will switch to standards mode, which means that they will follow the CSS specification closer. The document will also render quicker because the browser doesn’t have to interpret and try to compensate for invalid HTML. This will also reduce the difference in rendering between browsers.

The following doctype declares that the document is XHTML 1.0 Strict, and will make the web browsers that have so called “doctype switching” use their standards mode.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
Read more:

Character encoding

All XHTML documents should specify their character encoding.

The best way of specifying the character encoding is to configure the web server to send an HTTP content-type header with the character encoding. For detailed information on how to do this, check the documentation for the web server software you are using.

If you’re using Apache, you can specify the character encoding by adding one or more rules to your .htaccess file. For example, if all your files use utf-8, add this:

AddDefaultCharset utf-8

To specify a character encoding for files with a certain filename extension, use this:

AddCharset utf-8 .html

If your server lets you run PHP scripts, you can use the following to specify the character encoding:

<?php
    header("Content-Type: application/xhtml+xml; charset=utf-8");
?>

To serve your pages as HTML, change application/xhtml+xml to text/html. If you, for whatever reason, are unable to configure your web server to specify the character encoding you are using properly, use a <meta> element in the document’s <head> section. It’s a good idea to specify the character encoding this way even if your server is configured correctly.

For example, the following <meta> element tells the browser that a document uses the ISO-8859-1 character encoding:

<meta http-equiv="content-type" content="text/html;
charset=ISO-8859-1" />
Read more:

Comments, questions or suggestions? Please let me know.

© Copyright 2004–2006 Roger Johansson