About a year and a half ago I mentioned Nikita the Spider: a bulk validation and link checking tool as a useful quality assurance tool. Well, Nikita the Spider has received a lot of fixes since then and has recently been taken out of beta. It is no longer completely free, but the first 125 pages it crawls will cost you nothing.

But what may be more interesting is what Nikita finds when it crawls a site. Philip Semanchuk, Nikita’s author, has analysed the statistics Nikita collected during March 2008 and walks you through the results in By The Numbers – March 2008. A few highlights:

  • The most common validation error is neglecting to specify an alt attribute for img elements
  • The second most common error is failing to escape ampersands
  • XHTML doctypes are much more common than HTML doctypes
  • Over sixty percent of the crawled pages use a transitional doctype

Of course these statistics are only representative of a very small sample of the pages that exist on the web. In addition to that, those pages live on sites that somebody has actually asked Nikita to crawl, so it is likely that they are more aware of web standards than the average website owner/author/developer.

It’s still interesting reading though.

