Search engines and canonical URLs

Multiple URLs can point to the same resource on a website, something that is especially common for a websites home page. That can cause some problems for search engines, since it may not be obvious to them which URL is the best to use.

In SEO advice: url canonicalization, Google employee Matt Cutts answers several questions related to canonical URLs, redirects, and multiple URLs. Much useful information here for anyone interested in making sure search engine robots have as little trouble as possible when they come to visit your site.

One of the topics is using the www subdomain or not in URLs. The key is to be consistent. Pick one and stick with it. However, since you have no control over inbound links and some of them will use www while others won’t, it is a good idea to configure your server to redirect URLs the your preferred format. I do that here with a couple of rows in my .htaccess file:

  1. RewriteCond %{HTTP_HOST} ^456bereastreet\.com [NC]
  2. RewriteRule ^(.*) http://www.456bereastreet.com/$1 [R=301,L]

Any requests that do not use the www subdomain now get redirected to the same URL with “www” added in front of it.

Posted on January 12, 2006 in Quicklinks, Search Engine Optimisation

Comments

  1. I’m using …

    RewriteCond %{HTTP_HOST} !^example\.org$
    RewriteCond %{REQUEST_URI} !^/robots\.txt$
    RewriteRule ^(.*)$ http://example.org/$1 [R=301,L]
    

    … that way, every subdomain-request gets redirected (a good way to catch typos like ww.example.org or wwww.example.org).

    The robots.txt is excluded because i think some robots don’t handle redirects well.

  2. Oops, forgot to make the linebreaks work … use the source, Luke ;-)

  3. How does this show up in apache access logs? Specifically, will the 301 be logged?

  4. This is really important when using services like GMaps which are extremely picky about consumer URL’s. I use the following to make sure the server name matches the one I registered with GMaps…

    RewriteCond %{SERVER_NAME} !^www\.whatever\.com$ [NC]
    RewriteRule (.)* http://www.whatever.com%{REQUEST_URI} [R,L]
    
  5. January 12, 2006 by Roger Johansson (Author comment)

    Martin: I fixed your comment :-).

    Redirecting all subdomains seems like a good idea unless you use other subdomains than www for something.

    Champ: It depends on how the server is configured. The ReWriteLogLevel Directive can be used to control how verbose rewrite logging should be.

  6. If you want to use other subdomains, you can just exclude those from the rule. Just add a line like

    RewriteCond %{HTTP_HOST} !^foobar.example.org$

    But that’s not always necessary; for example, if you use Confixx to configure subdomains, Apache will only look at the .htaccess for the subdomain and won’t find the rules added to the “www.” or wildcard subdomain’s .htaccess.

  7. Great article, and something I am dealing with today. We have several domains that all serve up the same content - they all mirror each other. I have been tracking some things in Google Analytics, and I want to make sure all starts/searches go to the right place.

    Good read! Thanks!

    peace, Nate

  8. That is indeed a very interesting topic. Thanks for the link, Roger!

    Regarding www subdomains, I just say http://no-www.org/.

  9. This has been advocated for years under the motto no-www. Well, by me at least :-)

  10. I use the same approach as Martin (#1) does - I’m redirecting all subdomains to the plain url, I don’t like the “www.” ;-)

    I think it’s an important step to get more out of one’s site - search engines are ery important and no doubled content is also nice for your users, ‘cause you don’t present the same stuff twice.

  11. I completly see where the no-www folks are coming from, but I continue to use www for this reason: it allows the less-technical to quickly recognize a web url. I know .com should do that, but a) not all sites are .coms, and b) some people need all the clues we can give them.

    I don’t really like www, but I think it’s user-friendly. For a more technical audience, dropping the www makes great sense (see slashdot).

  12. I used your technique on my site in my .htaccess file. Then I tested it by typing in my domain w/out the “www.” Instead of defaulting to…

    http://www.andyknight.com

    …it defaulted to…

    http://www.andyknight.com/index.php/

    That brought up my error 404 page. I think it’s because Textpattern includes this within the .htaccess file by default:

    RewriteRule ^(.*) index.php

    But since I’ve not delved into .htaccess syntax before, I’m not sure how to get around this. Any ideas?

  13. January 13, 2006 by Steffen

    Would not it be better to use Apache´s Redirect oder RedirectPermanent directives instead of using RewriteRule? Otherwise the search engines will never learn which URL (www or non-www or whatever) you really prefer, because RewriteRules are completely transparent to them.

  14. Roger: Your link pointed to the ReWriteLogLevel directive which seems to only control the verbosity of the rewrite log, not the access log. I’m not too familier with Rewrite Rules, but the intention of my original question was clearify whether or the access logs (and the vast amount of statistical apps that parse them) would see this as a redirect.

  15. In addition to no-www.org, there is also the opposite, www.yes-www.org.

    Personally, I like no-www, but I know of some people who are yes-www people.

  16. Steffen: The R=301 causes a HTTP redirect with the status 301 (moved permanently). You can also write [Redirect=permanent], [R=301] is just a short form. That’s almost the same as using Redirect(Permanent).

    Champ: Yes, these redirects will appear in the access log.

  17. Justin: Google seems to have updated their Maps API; last week I noticed that my keys are working either with or without the www subdomain…

  18. Is their a way to acquire stats for each domain that’s redirected? I’m interested in how much visitors come through the .com domain and are (301) redirected to my real .be domain.(eg. AWStats doesn’t display them)

    Rewrite logs? Other options?

  19. Yoeri: If you have access to Apache’s httpd.conf, you can change the log format. You can add %{Host}i to get the host name logged. See Custom Log Formats for details.

  20. I tried adding

    1. RewriteCond %{HTTP_HOST} ^thevisualprocess.com [NC]
    2. RewriteRule ^(.*) http://www.thevisualprocess.com/$1 [R=301,L]

    to my .htaccess file but nothing happened afterwards, it would still go to the non www site if I didn’t bother with the www :/

  21. @Ryan:

    IE doesn’t alway change the url in the addressbar when it receives an 301-redirect. Try refreshing (F5) and look at the adres-bar

  22. January 13, 2006 by Matthew

    Ryan, for rewrite rules to work, you must set the following rule prior to any rewrite rules:

    RewriteEngine on

    You may want to check to be sure this is set. For more info, check the mod_rewrite documentation.

  23. Does anyone know how 301’s affect your pagerank? Are links to http://example.com/page.html of any use to the pagerank of http://www.example.com/page.html when 301-ed?

  24. @gerben #21: i’m using firefox but thanks for the point.

    @Matthew #22: Thanks I’ll try that!

  25. @gerben #23:
    from matt cutts’s site:

    Q: Is there anything else I can do? A: Yes. Suppose you want your default url to be http://www.example.com/ . You can make your webserver so that if someone requests http://example.com/, it does a 301 (permanent) redirect to http://www.example.com/ . That helps Google know which url you prefer to be canonical. Adding a 301 redirect can be an especially good idea if your site changes often (e.g. dynamic content, a blog, etc.)

  26. There’s no need to hard code the domain name in the rule. Capture the domain name in the RewriteCond and merge the value (stored in %1) into the url you redirect to in Rewrite Rule. Eg.

    RewriteCond %{HTTP_HOST} !^www\.[a-z-]+\.[a-z]{2,6} [NC]
    RewriteCond %{HTTP_HOST} ([a-z-]+\.[a-z]{2,6})$     [NC]
    RewriteRule ^/(.*)$ http://%1/$1                    [R=permanent,L]
    

    Personally I prefer to strip off the subdomain and redirect to the raw domain:

    RewriteCond %{HTTP_HOST} \.([a-z-]+\.[a-z]{2,6})$ [NC]
    RewriteRule ^/(.*)$ http://%1/$1                  [R=permanent,L]
    
  27. Nice idea, Dan. So here’s what I’m using now to get rid of the subdomain:

    RewriteCond %{HTTP_HOST} \.([^\.]+\.[^\.0-9]+)$  
    RewriteCond %{REQUEST_URI} !^/robots\.txt$  
    RewriteRule ^(.*)$ http://%1/$1 [R=301,L]
    

    The pattern is designed to catch as most as possible while keeping it simple. It even works with IDNs and TLDs that could appear in the future (having only 1 character, being longer than 6 characters or having numeric characters in it, however, the redirect won’t work in the last case, but the server would still be accessible). The 0-9 ensures that it does not cause a redirect to the second half of the server’s IP address if it was called directly using the IP.

    (How do i use an underscore without triggering emphasis? And the backslashes disappeared, in the preview at least :-()

  28. Good post. I still can’t decide which to use, www or no-www. Its not as if there is a strong, clearly defined decision in the industry either. Gah. decisions, decisions, decisions. What makes one case, for me, for the www. prefix, is CTRL-Enter in a browser. It automatically attatches www. and .com to a string.

  29. January 14, 2006 by Roger Johansson (Author comment)

    Martin #27: The comment preview is messed up. If I knew how to fix it, I would. Sorry :-(.

    mike: CTRL-Enter? Just Enter works for me. Or is that a Mac thing?

  30. XP here; just enter, on my firefox setup, googles for the string. But thinking about my comment for a bit, I’d bet that many people who want/use the www. prefix don’t know about CTRL-Enter on a pc.

  31. January 14, 2006 by Roger Johansson (Author comment)

    Ok. On the Mac every browser adds www. and .com before and after the string you type in the location field when you hit Enter. I hadn’t heard of CTRL+Enter for Windows before.

  32. Okay, whoever’s interested in the code I posted in comment #27, base64-decode the following to get what I intended to write :P

    UmV3cml0ZUNvbmQgJXtIVFRQX0hPU1R9IFwuKFteXC5dK 1wuW15cLjAtOV0rKSQNClJld3JpdGVDb25kICV7UkVRVU VTVF9VUkl9ICFeL3JvYm90c1wudHh0JA0KUmV3cml0ZVJ 1bGUgXiguKikkIGh0dHA6Ly8lMS8kMSBbUj0zMDEsTF0=

  33. Sorry, I verified it before posting, but it seems the base64 is also broken (not sure why). So here’s another attempt: posting it as an image.

  34. January 15, 2006 by Roger Johansson (Author comment)

    Martin: I think I managed to fix your original comment (#27).

    For anyone wanting to post code examples, the easiest way is to insert four spaces in front of each line of code. That tells Markdown to enclose the code in a pre element and not interpret any of it as Markdown syntax.

    Sorry for the trouble. Markdown works, but the comment preview does not. Well, the preview works, but it also ruins much of the Markdown syntax - the textarea on the preview page does not contain exactly what you typed into it, so you need to reenter a lot of the Markdown syntax. I’ve spent days trying to fix the problem, but it seems impossible so I’ve given up. Once I have a few weeks to spare I’ll look into either upgrading Movable Type or switching to something else.

  35. <VirtualHost IP.address:80>
       ServerName domain.com
       ServerAlias www.domain.com
       ...
    </VirtualHost>
    

    The de facto usage of www as a subdomain is a holdout from the dark ages of the Web when HTTP was a new protocol and the same domain was often used for mail, archie, ftp. etc. While this is still very true, the vast majority of domains registered these days serve only one purpose. I vote for no-www. In other words, by default your domain in a http URI should point at your web presence, and the www subdomain should also be there for backwards “compatibility.”

    This makes for shorter URLs, quicker recognition of your domain, and underscores the current domination of the Web in the Internet suite of protocols. That’s not to say that subdomains have no value—blog.domain.com being one (obvious) example.

  36. January 16, 2006 by Matthew

    On the Ctrl+Enter thing (for Windows)…

    Ctrl+Enter will turn whatever into www.whatever.com in Internet Explorer. Firefox also allows you to use Shift+Enter to do www.whatever.net and Ctrl+Shift+Enter to do www.whatever.org.

    Maybe some of you will find this useful. I use them constantly.

  37. For the life of me I am unable to redirect any and all subdomains (including the www) AND the main domain to an entirely new domain.

    I’m using:

    RewriteEngine on
    RewriteCond %{HTTP_HOST} ^([a-z0-9]+)\..+\..*$ [NC,OR]
    RewriteCond %{HTTP_HOST} !^www\.FORMERdomain\.com [NC]
    RewriteRule ^(.*)$ http://NEWdomain.com/ [L,R=301]
    

    It will redirect www.FORMERdomain.com & FORMERdomain.com to http://NEWdomain.com and that’s it - any ideas?

    Thanks!

  38. Sorry,

    Underscore issues…

  39. I’m using …

    RewriteCond %{HTTPHOST} ^example.org$ [NC,OR] RewriteCond %{HTTPHOST} ^www.example2.org$ [NC] RewriteRule ^(.*)$ http://example.org/$1 [R=301,L]

  40. Thank you for your great article on top of that it’s realy helpful but if I definitely understood subject I used that:

    RewriteEngine On RewriteRule ^kategoria-([^-]+).html$ index.php?kat=$1 RewriteRule ^wpis-([^-]+)podkat-([^-]+)strona-([^-]+).html$ index.php?kat=$1&pod=$2&str=$3 RewriteRule ^wpis-([^-]+)podkat-([^-]+).html$ index.php?kat=$1&pod=$2 RewriteRule ^info-([^-]+).html$ index.php?m=3&wpis=$1

    But if I definitely understood subject I used that ci to have seo links in .html

  41. Wow nice article, but I especially dig the comments!

    Heres what I use.. From: Ultimate htaccess Article

    Options +FollowSymLinks
    RewriteEngine On
    RewriteBase /
    RewriteCond %{REQUEST_URI} !^/robots\.txt$ [NC]
    RewriteCond %{HTTP_HOST} !^www\.example\.com$ [NC]
    RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
    
  42. Hi everyone!

    This is a terrific article, and a bunch of great comments. I’m late to the party, but I’m hoping somebody can help with my question.

    I’m using this in .htaccess:

    Options +FollowSymlinks
    RewriteEngine on
    rewritecond %{http_host} ^legalandrew.com [nc]
    rewriterule ^(.*)$ http://www.legalandrew.com/$1 [r=301,nc]
    

    That redirects regular requests for “legalandrew.com” to “www.legalandrew.com” just fine. The problem is when any other page is typed in, the redirect goes to the homepage. Try visiting my About page, without typing in “www”: About Legal Andrew.

    Does anybody know how to fix this? To me, any page request without “www” should redirect to that page’s “www” version. I’m at a loss on how to make this happen.

    Thanks in advance, Andrew

Comments are disabled for this post (read why), but if you have spotted an error or have additional info that you think should be in this post, feel free to contact me.