In 2009, Google announced it would be supporting the rel=canonical tag. It’s a simple bit of markup with a simple enough function, but it’s still a tad confusing at times; especially when it comes to the flow of link equity.
What is a canonical tag?
Rel=canonical is a quick and easy way of informing Google that one URL on your site should be treated as another. If you have duplicate content on two different pages which have different URLs, you can tackle the issue by assigning a rel=canonical to the less important page. It’s also useful for honing the flow of authority or link equity into site directories when you’ve got filtered pages.
You will find canonical tags within the <head> tags of a web page.
While it is one of the more useful bits of markup in terms of SEO, if it isn’t used correctly it have cause problems.
Here’s some of the most common issues I’ve seen on sites thus far.
Applying a rel=canonical to child pages
Say you have a parent directory page, and you’ve got a collection of child pages from said directory.
In order to drive the entirety of authority to the parent page, a canonical tag should appear in the <head> tag of /parent/child/ and /parent/child2/:
<link rel=”canonical” href=”http://www.domain.com/parent/” />
This would mean that any authority pointed towards those child pages will be passed on to the parent page. As such, the child pages would have no chance of ranking.
While this is acceptable practice for product pages within a category parent page (most useful on e commerce sites), there is the risk of missing the chance of ranking for any of the terms for which the child pages are targeting.
Using rel=canoncial in place of 301 redirects
Although they may behave in the same way in terms of the flow of authority, a canonical and a 301 redirect are two very different things.
When a visitor visits a page on which a canonical tag is pointing to another page, he or she will still see the page at this given URL. However, a 301 redirect pointing to the second page will take the user straight to the second page, without ‘visiting’ the first one.
Visitor Journey WITH canonical tag:
The visitor lands on page two, and sees the content on page two.
Visitor Journey with a 301 redirect:
In this instance, the visitor does not see Page Two.
Use canonical tags when you want viewers to be able to see the pages, but you don’t want those pages indexed.
Using rel=canonical for pagination
Paginated content refers to content which spreads across more than one page, often with a ‘click to next page’ type CTA, and seen in search results pages or FAQ sections. It will often include parameters according to the page number within the URL:
Because paginated pages present only a modest chance of ranking, many will apply rel=canonicals to said paginated pages in order to direct all authority to the main page. However, this is NOT best practice.
When you specify a rel=canonical on paginated pages – pointed to the first page in a series of FAQ pages, for example – the subsequent pages will not be indexed. That means the longtail targeting potential on any page other than the first is lost.
This is when the re=prev/next tag is most useful. By adding ‘rel=”next”’ and rel=”prev” in the <head> of each page, you can indicate the connection between component URLs.
For example, on page two of your paginated content, this should be placed in the <head> section:
<link rel=”prev” href=”http://www.domain.com/content-part-1.html”>
<link rel=”next” href=”http://www.domain.com/content-part-3.html”>
The final URL in the sequence of pages does not require a rel=”next” tag.
This is the method which Google itself recommends.
Pointing canonicals to 404 pages
Pointing a canonical to a 404, or soft 404 page, is surprisingly common; it’s often a legacy thing which hangs around after site architecture has been changed.
If a canonical points to a 404 page, Google will probably ignore it. If it points to a soft 404 page, that page may well be the one which gets all the authority and everything you wanted to keep.
I’ve seen this done for entire directories, when the parent page’s URL was changed and then someone forgot to update the canonicals. Result: the new parent page with the new URL struggles to rank, but the soft 404 page ended up on page 1 for a site search (although obviously there had been another failing in the soft 404 page being able to be indexed, thanks to misplaced meta robots.)
Finding Canonicals on Your Site
It’s easy to find 404 canonicals on a big site if you use Screaming Frog – I know I talk about that a lot, but it’s my favourite crawler. Plus, frogs.
Crawl the entire site, export it, and check out the links specified in the ‘canonicals’ column.
Now dedupe the list by the canonical link element, and chuck them in a .txt file (Notepad is fine).
Once you’ve saved it, click ‘mode’ in the Screaming Frog menu and switch it to ‘list’. Now upload your .txt file, and click OK.
…and it will show you which of those canonical tags is a 404.
Mixing up Relative URLs in Canonicals
When Google reads a canonical, it can read an absolute URL or a relative URL. However, when you include the domain in your href, it MUST include the protocol (the ‘http://’ bit of a URL.)
Google will accept the following for canonicals:
<link rel=”canonical” href=”/funny-sloths.html” />
<link rel=”canonical” href=”http://slothland.com/funny-sloths.html” />
But it will NOT accept (or will most likely ignore):
<link rel=”canonical” href=”slothland.com/funny-sloths.html” />
Making Canonical Chains
So a canonical points to one page. Which in turn, has a canonical which points to another page. And then that page points to the final page in the chain with another canonical.
Moz reckons that a 301 redirect passes the same amount of link juice as a rel=canonical, although Matt Cutts said that both the 301 redirect and rel=canonical only pass all but a tiny little bit of authority in 2011. I would be fascinated to know if anyone has any more insight into this…tweet me if you do.
Still, there’s no reason to chain canonicals any more than there is to chain 301 redirects. It’s messy, and it’s not clear if you won’t affect authority – not to mention site speed – by doing it.
I am not sure what happens when you make a canonical loop; by that I mean that I am not sure what Google will do if it sees a canonical from page 2 pointing to page 1, which in turn has a canonical pointing to page 2…but given that Google admits it may ignore mistakenly written relative URLs, I’d hazard a guess that it would ignore the canonicals in a case like this altogether.
Have I missed anything? Made some mistakes? Let me know.