Google valid and strict

In the comments on my article Ten reasons to learn and use web standards, someone tried to make the point that web standards are largely irrelevant by using Google’s home page as an example:

Take Google’s homepage. Is it inaccessible because it does not validate? Does it minimise the number of potential visitors? How much bandwidth would they save by adding a doctype? Quoting attributes? Using XHTML?

Here’s a quote from my reply:

Since you mention bandwidth I just had a quick look, and I managed to recreate the Google homepage in valid HTML 4.01 Strict and dump all the layout tables. As a result I reduced the file size by a couple of hundred bytes, despite quoting the required attributes and adding a DOCTYPE and a bunch of CSS rules.

For some reason I never got around to writing anything about that Google home page remake, but Philipp Lenssen’s post Google Strict vs Google Deprecated made me remember that I had it lying around on my computer.

I opened the file and changed a couple of things to make it match Google’s home page as it looks now (they made a couple of slight changes since December 2005). Quoting all attributes that require quoting and adding a Doctype plus a bunch of CSS rules to replace the tables, spacer GIFs and font tags actually reduced file size. And I didn’t even move the CSS and JavaScript to external files or get rid of all the inline event handlers.

Anyhow, the result is a valid HTML 4.01 Strict file that is 3 902 bytes large. Google’s invalid kinda-HTML 2.something very-loose is 4 944 bytes. The valid and strict version is 1 042 bytes smaller. That’s 21 percent savings on bandwidth costs.

The valid version of Google’s home page looks all but identical to the invalid one in CSS capable browsers, including Internet Explorer. There are a couple of slight, completely insignificant pixel shifts caused by browsers handling text input widths and empty table cells differently. Note that I changed the URLs in the valid version I am linking to here to avoid getting tons of 404 errors, making it 168 bytes larger than with Google’s original URLs.

The myth that Google is using invalid markup to save bandwidth is clearly just a myth.

Whether the reason is backwards compatibility, fear of change, developer ignorance, server platform inflexibility, that they just don’t care, or something else, I can only guess. But it sure isn’t saving them any bandwidth.

Posted on August 14, 2006 in Web Standards

Comments

  1. I also played around a bit with a valid version of Google a while ago, and while the results don’t include the latest tiny updates, it is still quite close. I also took some time to examine the markup of the search results page as well. It still puzzles me as to why Google hasn’t implemented a valid solution.

  2. August 14, 2006 by Jeremy Carson

    i can tell you why they never implemented a valid solution. they’re google, they don’t have to. barely anybody using the site would really care if they had a valid site, and they’re not losing out on it.

    as for the 21 percent bandwidth savings. i don’t really know if they spend so much on bandwidth that they would care about that either.

    overall, we’re the only ones that care.

  3. Wow, great work. This can really be helpful to evangelize xhtml+css, haha. Thanks. :)

  4. Nice work Roger, I always like to read these types of ‘studies.’ Google sometimes strikes me as a company that develops sites in the ‘2 steps forward, 1 step back’ method. I may be totally wrong, but it still strikes me as odd that they haven’t gotten around to at least updating the markup for their homepage. They also don’t seem to be too concerned about bandwidth savings. Perhaps Doug Bowman is (or will be) working on those mindsets and we might see more standards based sites from Google in the near future.

  5. i don’t really know if they spend so much on bandwidth that they would care about that either.

    I don’t mean to sound sarcastic or anything, just curious, but how is this possible? I thought bandwidth was pretty expensive for everybody. Even if it is inexpensive, I figure a 21% savings that can be done in Roger’s spare time (which I’m sure is sparse) is something they can manage.

  6. On the bandwidth issue, this would only be a 21% saving on the bandwidth used serving their home page, not their total bandwidth bill.

    But I’m sure similar work on their search results page, and other pages, would net similar results.

  7. That’s pretty interesting. I never thought to open their hood and check out what’s running the car. But apparently it is rigged. Shame with such a simple page design, you’d think it was not a big deal to be strict and valid.

    On another note, Roger with Yahoo hiring so many design celebrities do you think they’ll become more align with web standards?

  8. August 14, 2006 by yotiao

    And what about real backward-compatibility? I mean, these guys are not stupid and even if they are, now they have Doug Bowman on board. But what if they have to support browser back to the 3.0 era? Maybe it’s just the 0,001% of all users of Google, but it’s still probably like couple of hundreds of thousands of people.
    I have no idea whether standards-compliant 456Google page looks OK in those browsers, though.

    If it does, then I think Doug will take care of it and we will soon see the brand new G.

    cheers
    jarek

  9. August 14, 2006 by Alexander S.

    Nice article, google should read this. One thing though, gzip will make the difference about 16% still some, but atleast less then 21%.

  10. Great article. Thankfully I’ve found that selling standards compliant code to clients is becoming easier and easier. Highlighting bandwidth savings is a real economic benefit that clients can understand and appreciate.This is another example of bandwidth savings in action.

  11. August 14, 2006 by mattur

    With respect, “web standards are largely irrelevant” is a deeply misleading summary of my comment which clearly stated:

    “I’d like to see a more objective analysis of the web standards credo”

    Leaving aside the other reason for Google’s HTML that you conveniently omit - backwards compatibility - the obvious point to make about your re-design is that you could make it even smaller by removing the doctype and other bits required for validation.

    This design will work in all CSS capable browsers too. It’s smaller, faster and non-standard.

    I realise critical analysis of the “web standards” movement’s simplistic hyperbole is widely regarded as heresy, but hey, what the heck - you started it! ;-)

  12. I’d be curious to see if Google pays for their bandwidth by the byte or by a monthly usage fee like most of us do. This would make a huge difference in the bandwidth-saving argument you have made.

  13. August 14, 2006 by Roger Johansson (Author comment)

    Jon: Wow, you definitely did a much more thorough job of it than I did :-).

    Jeremy: I have no clue how much Google pays for bandwidth, but spending very little work to pay 21% less seems worthwhile to me.

    Small Paul: I think there is even more bandwidth to save on the results pages. Just a hunch.

    John: Isn’t Yahoo all about the scripting celebs? Sure, the people I’ve read about going to Yahoo definitely know HTML and CSS too, but I’m not sure if that’s what they’re hired to do.

    yotiao: If you want to provide a styled fallback for really ancient browsers, that is quite possible. The quick hack I made will work fine but be unstyled in browsers that do not support CSS. But if we’re talking about backwards compatibility, remember that there are browsers that do not support tables either…

    Alexander: 16 % or 21 % doesn’t really matter. My point is that avoiding web standards does not save bandwidth.

    mattur: I suppose you are right about my summary of your comment. It depends a bit on how you read between the lines, and apparently I misinterpreted you.

    I do mention backwards compatibility, though I have not taken it into account in my example. The example is not fully optimised either. Removing the Doctype, however, will trigger Quirks mode in most browsers.

    Critical analysis is perfectly fine. Calling arguments used by some of the brightest minds in the web industry “simplistic hyperbole” on the other hand… ;-)

    Let’s leave it at that and not let the comments get out of hand.

  14. Your wrong, it isn’t the same.

    With the original version, I don’t see scroll bars until I am under 500 pixels in width. With your version, the window needs to be at least 600 pixels.

    Win XP, IE 6 - all updates, no toolbars

    By the way, did you create the software that runs this blog? I ask it because if this is an example of how cool CSS it, then CSS really sucks. Even with my browser window fully maximized I get horizontal scroll bars.

    With “invalid” I still get proper word wrapping and resizing. If you cannot do that with your so-called “valid” html, what’s the point?

  15. Very interesting article. I always knew that standards compliant code was cheaper, but wasnt really aware of how much of a saving it gives.

    It does make you wonder why google dosnt do the change. Even if it gave only 5% savings on google overall, thats still a huge saving, if you think about the kind of bandwidth a busy site such as google has to contend with.

  16. August 14, 2006 by Roger Johansson (Author comment)

    With the original version, I don’t see scroll bars until I am under 500 pixels in width. With your version, the window needs to be at least 600 pixels.

    Whatever. Considering I slapped this demo together in an hour or so I think it’s close enough for a proof-of-concept.

    Even with my browser window fully maximized I get horizontal scroll bars.

    Yep, that is correct. Thanks for making me aware of that bug. I had neglected to fully compensate for Internet Explorer’s lack of competence in the CSS department. Should be fixed now.

  17. The question is, does Google care?

    In Denmark, you can’t use the argument “save money on bandwidth”, because you don’t pay for bandwidth. But it’s always nice to have a website that loads fast!

    Anyway, I’m worried about if Google even cares. The only search engine that’s even close to Google is Yahoo in my opinion. I’m just waiting for the day when Google starts to charge people for their “extra service”.

    I’m sure that they always will keep their standard search engine free, but I’m also sure that one day it won’t be free to use their gmail, froggle, map and all of the other services. Imagine if they just charge 5$ per user, who would care? What if it was 10$ a year, who would care? Try multiply 5-10$ with the amount of users using Google.

    “The father of the Internet” should be a role model for every else, just hard when only “one person” decides if it’s a yes or no.

    Anyway nice article, hopefully it will have just a little impact on someone who’s important up there :)

  18. Hey Roger: What you use to know the size exact of the page?.

  19. Great stuff, I too like these experiments. It’s odd really, that Google, with its progressive, modern attitude, doesn’t take a couple of hours to fix this.

    Should make a good example too, for newbies wanting to learn about standards.

  20. It is also worth mentioning that google’s “view page in google cache” feature causes all pages to fall back to quirks mode. That should not be acceptable.

  21. Nice work, I’ll be linking this whenever anyone says anything about standards and bandwidth. :P

  22. Considering how often the Google homepage is accessed by visitors each day….a 21% bandwidth savings is HUGE for them (if they were to use it). Anyone know just how many hits a day it gets? I imagine they’d save a significant amount of MB’s of bandwidth.

  23. Does anyone note the irony that Google provides an Accessibility search engine that itself is not accessible? (Also note that Google.com itself does not make the top page of its own engine.)

  24. Its the backward compatability issue - for some reason that I have never been able to follow, some people feel that the same graphic experience (rather than the same search experience) is important when you hit the Google home page :(

    Hey Doug! Hit whoever is responsible at Google for that decision with a hammer, eh, mate? ;)

  25. DIV’s don’t work in older browsers. Neither does CSS. Having CSS styles on a separate file introduces latency.

    google homepage could probably be better, but losing support for legacy browsers should not be the price to pay for full standard compliance.

    Richard, why are you confusing accessibility with standards?

  26. August 15, 2006 by hagay

    Google uses Gzip compression on page output so uncompressed html data is irelevant…

  27. Whilst appreciating the discussion is the Google use (or none use) of standards, this is related to Googles use (or none use) of practical accessibility on the Accessible Search. The happy-go-lucky approach to coding the Accessible Search homepage is a little weird in its inappropriate mark-up, taking into account the pretty obvious market. This could be remedied with the use of standards of course. Two very simple pages that could be simple-er-r, one interesting discussion.

    The question is, does a similar (albeit less colourful) conversation happen around the Google mocha table? Good article, Roger.

  28. This is just so cool! But, in a way, isn’t kinda sad you still have to do this kind of stuff?

  29. I agree with yotiao - Google’s homepage is probably the most visited page on the entire web (millions if not billions of users), and while we are all happy upgrading our browsers and machines every few years there are people in other parts of the world stuck on version 3 browsers, Win95 or lower.

    Tiny fractions of a percentage of users for us is still hundreds of thousands of users to Google (have they ever released any browser share stats? Would be interesting).

  30. Didn’t they hire a CSS guru two months ago? I think we are likely to see a new version soon.

  31. I think that Yahoo (yahoo.com) and Microsoft (msn.com) has come longer in the webstandards movement than Google. But mayby this quickly will change now that Doug B. has joined the force. Mayby Doug even can make Google look good, who knows?

  32. @lk, I don’t believe someone stubborn enough to use a 10 year old browser should be able to browse the web in its full glory. Those still would be able to do their searches, hence the site would be accessible to them. I guess you are confusing accessibility with “having to look the same on all browsers”. Or am I missing something here?

  33. @Matthew Pennell—I would love for Google to release some browser stats.

    I’d rather base browser usage knowledge on Google’s logs than some independent research company.

    :)

  34. Been here, done that :) You can save few more bytes by dropping closing tag for body.

  35. Roger, I stand corrected. You are correct, they are more the scripting crew. But taking a look at a few of the YUI pieces, the code that they do use seems pretty clean.

    Would be interesting to get feedback from Google instead of all this hearsay. Come on Roger you’re a Web Celeb ;~)

  36. August 15, 2006 by Johan

    Patrick Lauke did a rework of the Firefox/Frugal google search page here:

    link to example

    link to article

  37. Richard, why are you confusing accessibility with standards?

    Because they go hand in hand. A standards compliant page might not equal accessibility, but for the web standardista that knows what he/she is doing, it will.

  38. August 15, 2006 by Struan

    Having recently purchased a google-mini server at work i’d previously not appreciated how clunky their mark-up actually was.

    It’s quite soul destroying customising the template but having to retain the tables and font tags.

    I only hope that Douglas Bowman will have a positive effect…

  39. On a vaguely related note, the WAI Interest Group Mailing list recently pointed us all to the fact that Google are currently seeking an Accessibility Testing Specialist. Assuming this means that there will be someone within their team with a little more expertise than the current team has, it may mean that Google will be making moves towards strict code, and also some other techniques to improve their overall accessibility and usability.

    Accessibility Testing Specialist - Mountain View

  40. August 15, 2006 by CW Petersen

    Regarding the amount of traffic on Google, though I don’t remember exactly where I heard it (perhaps on a TV special on the founders), the last figure I heard was “over” (a la McDonalds) 1 billion queries a day.

  41. Does anyone note the irony that Google provides an Accessibility search engine that itself is not accessible?

    I don’t know anybody with an accessibility problem have any trouble with Google in it’s current form. It’s almost the search engine of choice for users of assistive technology.

    Personally, I think the reasons for not coding the leanest possible pages while adhering to standards is the old ecomonic fallacy than money saved isn’t as important as money earned.

  42. Now, think how wonderful the world would be; if only Google went over to the lighter CSS-based version, and used those saved bytes to give us those frickin’ accesskeys for “Next” and “Previous”!

  43. August 16, 2006 by Matt

    Roger, there’s only one slight problem with your Google redesign:

    WebImagesVideoNew!NewsMapsmore »

    Add some spaces, man ;)

  44. August 16, 2006 by mattur

    Roger, these “brightest minds in the web industry” you refer to are presumably the same people who spent the past few years telling everyone to use XHTML for no apparent reason. Mindlessly parroting sub-optimal advice from the W3c does not qualify someone as bright, only popular.

    Removing the Doctype, however, will trigger Quirks mode in most browsers.

    Yup, no doubt there will be some “slight, completely insignificant pixel shifts.” But it’s still smaller and works just as well in modern browsers despite being non-standard.

  45. August 16, 2006 by Roger Johansson (Author comment)

    Frances: Yes, judging from that job description it looks like Google is going to improve.

    Matt: Hehe :-). Yeah I guess I’ll spend another half hour or so improving the demo.

  46. Yet more proof. Nice.

  47. Everybody here talks about Google not following the W3C standards, resulting in non-optimized bandwidth usage. And what about Microsoft or Yahoo? Theyre doing the same. Why should we care about this? Why dont we just try to concentrate on our own job and dont blame the other peoples work? Imagine all designers/coders do their jobs 21% better. Then they all helped Internet more than Google could ever do. There are plenty of much worse pages, but nobody is talking about them. Why? Hope my critics was constructive.

  48. August 17, 2006 by Jan Korbel

    Talking of optimizing Google’s homepage, but what about the search results page? That’s nice table and tag soup (font tag used regularly etc.). I imagine they could save more here.

  49. by the way.. i opened ur page ( http://www.456bereastreet.com/lab/google/ ) and test it in ( http://validator.w3.org/check?verbose=1&uri=http%3A%2F%2Fwww.456bereastreet.com%2Flab%2Fgoogle%2F)

    it is valid.

    And i looked at the code source…it is not begin with “html” and ends with ” html”. can it be? what is the porpuse for it? how can this page be ” valid” ?

    thank you.

    2006-08-15, 4.15 by lk

    DIV's don't work in older browsers. Neither does CSS.
    

    i think same way…thats the reason

  50. August 17, 2006 by Sami Pekkala

    sunipeyk:
    If you open the HTML 4.01 Strict DTD, you’ll find the following line at the end of the file:

    <!ELEMENT HTML O O (%html.content;) -- document root element -->

    The characters “O O” mean that both the start and end tags for the html element are optional. Therefore, omitting the tags doesn’t make the document invalid.

    Recommended reading: The Art of Reading a DTD

  51. It gets even worse if you’re using a Google Appliance. Mangled up HTML inside an XSLT. I’ve spent countless hours trying to figure out how everything is laid out. Incredibly hard to override with CSS, and a pain to edit the HTML.

  52. You must love that comment: ;)

    Bravo.

  53. August 18, 2006 by Anonymous

    You can save a few more bytes by using spaces instead of nbsp.

  54. Google probably doesn’t worry about bandwidth costs. Only those of us who have to purchase bandwidth really care. Google doesn’t use another service provider; they ARE the service provider.

    Even though 21% of a billion web pages seems like a large amount, it’s a rounding error when considering the amount of data traveling in the total Google infrastructure.

    So, bandwidth inefficiency probably doesn’t even enter their minds.

    I learned this lesson from the large firm where I work. My contribution to a redesign of our intranet a few years ago reduced bandwidth by more than 30%. Since the intranet gets billed to all of the corporation’s individual divisions, that savings was a very big deal worth millions of dollars. When I tried to promote the same thing for our customer facing internet, the response was, “No savings. Our customers pay for the bandwidth, not us.”

  55. August 19, 2006 by Jacqui

    While I don’t understand why Google won’t clean up its code, the search page as it is doesn’t bother me.

    What does bother me is the tag soup Google produces in its news feeds. Feeds start with a break, go to tables with such gems as width equals valign equals top, contain all kinds of tag soup and code used in ways I have never seen before. One gem went something like: bold San endbold bold Francisco endbold. Check any news report with a sentence that includes several bolded words and you will see each individual word has opening and closing bold tags.

    The RSS and Atom feeds are such a mess as to be unusable in any validating site. If they can’t get that right I don’t expect to see any change to the search pages anytime soon.

  56. August 19, 2006 by Roger Johansson (Author comment)

    You can save a few more bytes by using spaces instead of nbsp.

    Yep.

    Bob: It doesn’t really matter if they’re not paying for bandwidth. The main point of this exercise is to show that using web standards will not increase file size, as some have argued.

  57. This comment is highly speculative, but perhaps informative as well…

    I think Googles main concern is server load rather than bandwith. Server load for them probably means trying to reduce the number of hits per page as much as possible and reducing the amount of processing on the server in the same way. However, they do quite a bit of server side processing depending on one’s browser and cookies already, so they could easily send table-based layouts to the old browsers and CSS-based pages to moderns ones.

    In order to reduce the server load as much as possible I think they are using metrics based on TCP windowing. The page they have today is about 2K. That easily fits in a 2 segment TCP-window. Most TCP-connections defaults to maximum 3 segments per window (if memory serves me correctly). If the page could be reduced to fit in a single TCP-segment, including the HTTP-header, which would require dropping its size by 50 %, then they would get real bandwith savings on the start-page. A 21 % reduction is no reduction. It still takes 2 TCP-segments. But the difference in server load between sending one TCP-segment and two segments is probably not worth it.

    A Google search results page today is about 5K. That does not fit an a 3-segment TCP-window. Reducing it by perhaps 1.5K would make it fit in such a window and would reduce their server load significantly. Gzip probably is enough to accomplish that.

  58. August 28, 2006 by Andrew

    I also am mystified why almost no google web pages are valid html, when the pages are so simple and, apparently, they can save huge amounts on bandwidth. Gzip can easily realize a 50% size decrease on flat text even without the optimized CSS, but of course it has to be compressed first (thus increasing server load). I don’t know if the trade off is worth it.

  59. September 14, 2006 by Asbjørn Ulsberg

    Great work!

    However, there’s still lost of bytes to be saved on the initial page request by extracting the CSS and JavaScript to external files. Note that I write “initial page request” because on succeding requests, the CSS and JavaScript files won’t be downloaded from the server (unless they have changed and thus have a new ‘Last-Modified’ date or ‘ETag’ value).

    On top of that, there’s lots of bytes to be saved by extracting the script event handle associations from the markup and into the JavaScript itself (onclick, onload, etc), so after trimming all of this away, I got the HTML down to 1924 bytes. And there’s still bytes to be saved. The non-breaking spaces can be replaced with proper margin/padding in CSS, attribute quotes can be removed a lot of places, etc.

    On top of that, you have gzip which will reduce the size dramatically with at least 50% again, which gives us 900 bytes or so. If that doesn’t matter to Google, then they’re just ignorant, because it’s obvious that 900 bytes is faster to download and render in any browser under any circumstance than the original 4KiB.

  60. Actually, on extremely busy sites, it is far more important to save on the total number of requests than it is to save on bytes. Having recently suffered the results of having hosted a front page from which I was hoping to save on fetches via subsequent requests by the argument you advance, I have learned the hard way that it is better for a front page to be “improperly factored” - recall that browsers may also invoke dependent fetches in separate threads, thus causing a great increase in real load not just in amortised load.

    Google have obviously learned this lesson admirably in that their page will load in two fetches (ok, 3 if you count the 12x12 image of an “x”).

    There is an exponential rule that the larger number of requests to your site will hit it just once and then, perversely, simply never fetch anything again!

  61. So long as google is using regular old invalid HTML that means no XHTML only browsers will ever emmerge. And I say that’s a good thing.

  62. March 14, 2007 by egonk

    Gzip can easily realize a 50% size decrease on flat text even without the optimized CSS, but of course it has to be compressed first (thus increasing server load). I don’t know if the trade off is worth it.

    They could simply cache the front page in gzipped form on servers. I would be surprised, if they are not already doing it :)

  63. Quote by jk: “So long as google is using regular old invalid HTML that means no XHTML only browsers will ever emmerge. And I say that’s a good thing.”

    My opinion: Do you trust scaffolding put up in a non-standardised way? No you don’t, Poor scaffolding/building is the result of cowboy builders. I would love to see the whole web standardised, particularly if it is in XHTML, XML and RDF (with external CSS and external JS). It would be good if Google made a proper Valid HTML homepage, just so that they can show the way into the future.

    Thank you Roger for developing this little page, hopefully Google will look at it soon. Even their mobile phone (cellphone) webpage is not valid HTML (even though the URL is www.google.com/xhtml ). At some point I think I might have a go at developing an entirely XHTML 1.0 Strict version (with external CSS - and ext JavaScript if necessary) which hooks into the search results.

    Just my opinion - but then I am a bit of a Web Standards and Semantic Web advocate.

Comments are disabled for this post (read why), but if you have spotted an error or have additional info that you think should be in this post, feel free to contact me.