Beware of “Web Page, complete” when saving HTML pages with your browser

Every now and then I need to ask a client or another developer to save a copy of a web page and email it to me. Mostly it is because they are viewing a page that is behind a firewall and I need to see the markup. Something that happens a lot is that they send me HTML that is more or less mangled.

This HTML mangling happens when you choose “Web Page, complete” or a similarly named option in your web browser. With this option selected, browsers don’t save just the HTML source of the page – they also save any associated images, stylesheets, javascript files and other resources and change all references to those files to make them point to the locally saved copy.

Changing the references to the included files is necessary, of course. The problem is that many browsers also change some of the other markup while doing this. Some (Firefox and Camino to name two) just remove closing slashes from empty elements (<br /> becomes <br>), while others (IE, Chrome) also change the case of element names (<br /> becomes <BR>).

This may or may not be a problem depending on why you need to see the markup. Either way it is very important to realise that the markup you are seeing is not necessarily exactly the same markup that gets sent to the browser.

The safe way, as far as I can tell, to save the HTML of a page from a web browser is to choose the option that is called “Web Page, HTML only” or something similar. All browsers I have tested in seem to save the page with the original markup intact with this option chosen.

This post is a Quick Tip. Background info is available in Quick Tips for web developers and web designers.

Posted on February 1, 2010 in Quick Tips, Browsers