Search engines and canonical URLs

Multiple URLs can point to the same resource on a website, something that is especially common for a websites home page. That can cause some problems for search engines, since it may not be obvious to them which URL is the best to use.

In SEO advice: url canonicalization, Google employee Matt Cutts answers several questions related to canonical URLs, redirects, and multiple URLs. Much useful information here for anyone interested in making sure search engine robots have as little trouble as possible when they come to visit your site.

One of the topics is using the www subdomain or not in URLs. The key is to be consistent. Pick one and stick with it. However, since you have no control over inbound links and some of them will use www while others won’t, it is a good idea to configure your server to redirect URLs the your preferred format. I do that here with a couple of rows in my .htaccess file:

  1. RewriteCond %{HTTP_HOST} ^456bereastreet\.com [NC]
  2. RewriteRule ^(.*) http://www.456bereastreet.com/$1 [R=301,L]

Any requests that do not use the www subdomain now get redirected to the same URL with “www” added in front of it.

Comments

1. January 12, 2006 by Martin

I'm using ...

RewriteCond %{HTTP_HOST} !^example\.org$
RewriteCond %{REQUEST_URI} !^/robots\.txt$
RewriteRule ^(.*)$ http://example.org/$1 [R=301,L]

... that way, every subdomain-request gets redirected (a good way to catch typos like ww.example.org or wwww.example.org).

The robots.txt is excluded because i think some robots don't handle redirects well.

2. January 12, 2006 by Martin

Oops, forgot to make the linebreaks work ... use the source, Luke ;-)

3. January 12, 2006 by Champ

How does this show up in apache access logs? Specifically, will the 301 be logged?

4. January 12, 2006 by Justin Perkins

This is really important when using services like GMaps which are extremely picky about consumer URL's. I use the following to make sure the server name matches the one I registered with GMaps...

RewriteCond %{SERVER_NAME} !^www\.whatever\.com$ [NC]
RewriteRule (.)* http://www.whatever.com%{REQUEST_URI} [R,L]
5. January 12, 2006 by Roger Johansson

Martin: I fixed your comment :-).

Redirecting all subdomains seems like a good idea unless you use other subdomains than www for something.

Champ: It depends on how the server is configured. The ReWriteLogLevel Directive can be used to control how verbose rewrite logging should be.

6. January 12, 2006 by Martin

If you want to use other subdomains, you can just exclude those from the rule. Just add a line like

RewriteCond %{HTTP_HOST} !^foobar.example.org$

But that's not always necessary; for example, if you use Confixx to configure subdomains, Apache will only look at the .htaccess for the subdomain and won't find the rules added to the "www." or wildcard subdomain's .htaccess.

7. January 12, 2006 by nate klaiber

Great article, and something I am dealing with today. We have several domains that all serve up the same content - they all mirror each other. I have been tracking some things in Google Analytics, and I want to make sure all starts/searches go to the right place.

Good read! Thanks!

peace, Nate

8. January 12, 2006 by SilentWarrior

That is indeed a very interesting topic. Thanks for the link, Roger!

Regarding www subdomains, I just say http://no-www.org/.

9. January 12, 2006 by Anne van Kesteren

This has been advocated for years under the motto no-www. Well, by me at least :-)

10. January 12, 2006 by Julian Schrader

I use the same approach as Martin (#1) does - I'm redirecting all subdomains to the plain url, I don't like the "www." ;-)

I think it's an important step to get more out of one's site - search engines are ery important and no doubled content is also nice for your users, 'cause you don't present the same stuff twice.

11. January 12, 2006 by David Benton

I completly see where the no-www folks are coming from, but I continue to use www for this reason: it allows the less-technical to quickly recognize a web url. I know .com should do that, but a) not all sites are .coms, and b) some people need all the clues we can give them.

I don't really like www, but I think it's user-friendly. For a more technical audience, dropping the www makes great sense (see slashdot).

12. January 12, 2006 by Andy Knight

I used your technique on my site in my .htaccess file. Then I tested it by typing in my domain w/out the "www." Instead of defaulting to...

http://www.andyknight.com

...it defaulted to...

http://www.andyknight.com/index.php/

That brought up my error 404 page. I think it's because Textpattern includes this within the .htaccess file by default:

RewriteRule ^(.*) index.php

But since I've not delved into .htaccess syntax before, I'm not sure how to get around this. Any ideas?

13. January 13, 2006 by Steffen

Would not it be better to use Apache´s Redirect oder RedirectPermanent directives instead of using RewriteRule? Otherwise the search engines will never learn which URL (www or non-www or whatever) you really prefer, because RewriteRules are completely transparent to them.

14. January 13, 2006 by Champ Bennett

Roger: Your link pointed to the ReWriteLogLevel directive which seems to only control the verbosity of the rewrite log, not the access log. I'm not too familier with Rewrite Rules, but the intention of my original question was clearify whether or the access logs (and the vast amount of statistical apps that parse them) would see this as a redirect.

15. January 13, 2006 by shorty114

In addition to no-www.org, there is also the opposite, www.yes-www.org.

Personally, I like no-www, but I know of some people who are yes-www people.

16. January 13, 2006 by Martin

Steffen: The R=301 causes a HTTP redirect with the status 301 (moved permanently). You can also write [Redirect=permanent], [R=301] is just a short form. That's almost the same as using Redirect(Permanent).

Champ: Yes, these redirects will appear in the access log.

17. January 13, 2006 by Kimmo

Justin: Google seems to have updated their Maps API; last week I noticed that my keys are working either with or without the www subdomain...

18. January 13, 2006 by Yoeri

Is their a way to acquire stats for each domain that's redirected? I'm interested in how much visitors come through the .com domain and are (301) redirected to my real .be domain.(eg. AWStats doesn't display them)

Rewrite logs? Other options?

19. January 13, 2006 by Martin

Yoeri: If you have access to Apache's httpd.conf, you can change the log format. You can add %{Host}i to get the host name logged. See Custom Log Formats for details.

20. January 13, 2006 by Ryan

I tried adding

  1. RewriteCond %{HTTP_HOST} ^thevisualprocess\.com [NC]
  2. RewriteRule ^(.*) http://www.thevisualprocess.com/$1 [R=301,L]

to my .htaccess file but nothing happened afterwards, it would still go to the non www site if I didn't bother with the www :/

21. January 13, 2006 by gerben

@Ryan:

IE doesn't alway change the url in the addressbar when it receives an 301-redirect. Try refreshing (F5) and look at the adres-bar

22. January 13, 2006 by Matthew

Ryan, for rewrite rules to work, you must set the following rule prior to any rewrite rules:

RewriteEngine on

You may want to check to be sure this is set. For more info, check the mod_rewrite documentation.

23. January 13, 2006 by gerben

Does anyone know how 301's affect your pagerank? Are links to http://example.com/page.html of any use to the pagerank of http://www.example.com/page.html when 301-ed?

24. January 13, 2006 by Ryan

@gerben #21: i'm using firefox but thanks for the point.

@Matthew #22: Thanks I'll try that!

25. January 14, 2006 by tomo

@gerben #23: from matt cutts's site:

Q: Is there anything else I can do? A: Yes. Suppose you want your default url to be http://www.example.com/ . You can make your webserver so that if someone requests http://example.com/, it does a 301 (permanent) redirect to http://www.example.com/ . That helps Google know which url you prefer to be canonical. Adding a 301 redirect can be an especially good idea if your site changes often (e.g. dynamic content, a blog, etc.)

26. January 14, 2006 by Dan Kubb

There's no need to hard code the domain name in the rule. Capture the domain name in the RewriteCond and merge the value (stored in %1) into the url you redirect to in Rewrite Rule. Eg.

RewriteCond %{HTTP_HOST} !^www\.[a-z-]+\.[a-z]{2,6} [NC]
RewriteCond %{HTTP_HOST} ([a-z-]+\.[a-z]{2,6})$     [NC]
RewriteRule ^/(.*)$ http://%1/$1                    [R=permanent,L]

Personally I prefer to strip off the subdomain and redirect to the raw domain:

RewriteCond %{HTTP_HOST} \.([a-z-]+\.[a-z]{2,6})$ [NC]
RewriteRule ^/(.*)$ http://%1/$1                  [R=permanent,L]
27. January 14, 2006 by Martin

Nice idea, Dan. So here's what I'm using now to get rid of the subdomain:

RewriteCond %{HTTP_HOST} \.([^\.]+\.[^\.0-9]+)$  
RewriteCond %{REQUEST_URI} !^/robots\.txt$  
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]

The pattern is designed to catch as most as possible while keeping it simple. It even works with IDNs and TLDs that could appear in the future (having only 1 character, being longer than 6 characters or having numeric characters in it, however, the redirect won't work in the last case, but the server would still be accessible). The 0-9 ensures that it does not cause a redirect to the second half of the server's IP address if it was called directly using the IP.

(How do i use an underscore without triggering emphasis? And the backslashes disappeared, in the preview at least :-()

28. January 14, 2006 by mike

Good post. I still can't decide which to use, www or no-www. Its not as if there is a strong, clearly defined decision in the industry either. Gah. decisions, decisions, decisions. What makes one case, for me, for the www. prefix, is CTRL-Enter in a browser. It automatically attatches www. and .com to a string.

29. January 14, 2006 by Roger Johansson

Martin #27: The comment preview is messed up. If I knew how to fix it, I would. Sorry :-(.

mike: CTRL-Enter? Just Enter works for me. Or is that a Mac thing?

30. January 14, 2006 by mike

XP here; just enter, on my firefox setup, googles for the string. But thinking about my comment for a bit, I'd bet that many people who want/use the www. prefix don't know about CTRL-Enter on a pc.

31. January 14, 2006 by Roger Johansson

Ok. On the Mac every browser adds www. and .com before and after the string you type in the location field when you hit Enter. I hadn't heard of CTRL+Enter for Windows before.

32. January 14, 2006 by Martin

Okay, whoever's interested in the code I posted in comment #27, base64-decode the following to get what I intended to write :P

UmV3cml0ZUNvbmQgJXtIVFRQX0hPU1R9IFwuKFteXC5dK 1wuW15cLjAtOV0rKSQNClJld3JpdGVDb25kICV7UkVRVU VTVF9VUkl9ICFeL3JvYm90c1wudHh0JA0KUmV3cml0ZVJ 1bGUgXiguKikkIGh0dHA6Ly8lMS8kMSBbUj0zMDEsTF0=

33. January 14, 2006 by Martin

Sorry, I verified it before posting, but it seems the base64 is also broken (not sure why). So here's another attempt: posting it as an image.

34. January 15, 2006 by Roger Johansson

Martin: I think I managed to fix your original comment (#27).

For anyone wanting to post code examples, the easiest way is to insert four spaces in front of each line of code. That tells Markdown to enclose the code in a pre element and not interpret any of it as Markdown syntax.

Sorry for the trouble. Markdown works, but the comment preview does not. Well, the preview works, but it also ruins much of the Markdown syntax - the textarea on the preview page does not contain exactly what you typed into it, so you need to reenter a lot of the Markdown syntax. I've spent days trying to fix the problem, but it seems impossible so I've given up. Once I have a few weeks to spare I'll look into either upgrading Movable Type or switching to something else.

35. January 15, 2006 by Douglas Clifton
<VirtualHost IP.address:80>
   ServerName domain.com
   ServerAlias www.domain.com
   ...
</VirtualHost>

The de facto usage of www as a subdomain is a holdout from the dark ages of the Web when HTTP was a new protocol and the same domain was often used for mail, archie, ftp. etc. While this is still very true, the vast majority of domains registered these days serve only one purpose. I vote for no-www. In other words, by default your domain in a http URI should point at your web presence, and the www subdomain should also be there for backwards "compatibility."

This makes for shorter URLs, quicker recognition of your domain, and underscores the current domination of the Web in the Internet suite of protocols. That's not to say that subdomains have no value—blog.domain.com being one (obvious) example.

36. January 16, 2006 by Matthew

On the Ctrl+Enter thing (for Windows)...

Ctrl+Enter will turn whatever into www.whatever.com in Internet Explorer. Firefox also allows you to use Shift+Enter to do www.whatever.net and Ctrl+Shift+Enter to do www.whatever.org.

Maybe some of you will find this useful. I use them constantly.

37. July 22, 2006 by Denis

For the life of me I am unable to redirect any and all subdomains (including the www) AND the main domain to an entirely new domain.

I'm using:

RewriteEngine on RewriteCond %{HTTPHOST} ^([a-z0-9]+)..+..*$ [NC,OR] RewriteCond %{HTTPHOST} !^www.FORMERdomain.com [NC] RewriteRule ^(.*)$ http://NEWdomain.com/ [L,R=301]

It will redirect www.FORMERdomain.com & FORMERdomain.com to http://NEWdomain.com and that's it - any ideas?

Thanks!

38. July 22, 2006 by Denis

Sorry,

Underscore issues...

39. September 26, 2006 by A2D Suchmaschinenoptimierung

I'm using ...

RewriteCond %{HTTPHOST} ^example.org$ [NC,OR] RewriteCond %{HTTPHOST} ^www.example2.org$ [NC] RewriteRule ^(.*)$ http://example.org/$1 [R=301,L]

40. September 30, 2006 by Romuald

Thank you for your great article on top of that it’s realy helpful but if I definitely understood subject I used that:

RewriteEngine On RewriteRule ^kategoria-([^-]+).html$ index.php?kat=$1 RewriteRule ^wpis-([^-]+)podkat-([^-]+)strona-([^-]+).html$ index.php?kat=$1&pod=$2&str=$3 RewriteRule ^wpis-([^-]+)podkat-([^-]+).html$ index.php?kat=$1&pod=$2 RewriteRule ^info-([^-]+).html$ index.php?m=3&wpis=$1

But if I definitely understood subject I used that ci to have seo links in .html

41. January 8, 2007 by John Thomas

Wow nice article, but I especially dig the comments!

Heres what I use.. From: Ultimate htaccess Article

Options +FollowSymLinks
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_URI} !^/robots\.txt$ [NC]
RewriteCond %{HTTP_HOST} !^www\.example\.com$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
42. April 23, 2007 by Andrew Flusche

Hi everyone!

This is a terrific article, and a bunch of great comments. I'm late to the party, but I'm hoping somebody can help with my question.

I'm using this in .htaccess:

Options +FollowSymlinks
RewriteEngine on
rewritecond %{http_host} ^legalandrew.com [nc]
rewriterule ^(.*)$ http://www.legalandrew.com/$1 [r=301,nc]

That redirects regular requests for "legalandrew.com" to "www.legalandrew.com" just fine. The problem is when any other page is typed in, the redirect goes to the homepage. Try visiting my About page, without typing in "www": About Legal Andrew.

Does anybody know how to fix this? To me, any page request without "www" should redirect to that page's "www" version. I'm at a loss on how to make this happen.

Thanks in advance, Andrew

Sorry, comments are closed for this post.

Information, sponsorship, and externals

Subscribe

SidebarAds

Looking for web hosting?

Try DreamHost!

Use the promo code 456BEREASTREET3 to save USD 20 when you sign up!

Favourites, here and elsewhere

Affiliation

  • NetRelations
  • Kaffesnobben
  • Dagens recept
  • 9rules network member

Support this site

Show your support by buying a book or two from SitePoint or getting me something from my Amazon Wish List.