Dynamic canonical URLs

It is not too often that Google delivers an explicit mandate to help websites rank better. Most of their guidelines are vague and not too constructive, but the recent announcement of support for canonical URLs is as explicit directive as their implementation of XML sitemaps and the “nofollow” value for the rel attribute.

A complement to sitemaps and redirects, canonical URLs help search engines understand what URL should be used for any given page of content. For static websites, this is mostly irrelevant; an HTML file is an HTML file and there is no alternative. But for dynamically generated sites where multiple database query strings can lead to the same content, Google, Yahoo, Bing and others can keep their indexes clean by focusing on the correct version. While Textpattern maintains a fairly straightfowrard URL structure, it is still affected. For instance, all three of these links go to the same place:

  • http://graphicpush.com/rdfa-microformats-standards-big-questions
  • http://graphicpush.com/index.php?id=374
  • http://graphicpush.com/374 (using smd_short_url plugin)

Obviously, I want the GOOG to index the first URL — it is the most descriptive and SEO-friendly. To help explicitly control that, I need to implement a canonical URL link element.

The Basic Structure

We’ve all used the link tag for other purposes — importing CSS files, establishing global navigation elements, and more. This follows the same structure, but uses “canonical” as the value for the rel attribute. For example:

<link rel="canonical" href="http://graphicpush.com/rdfa-microformats-standards-big-questions" />

Now, no matter what URL Google actually follows to my page, it knows that http://graphicpush.com/rdfa-microformats-standards-big-questions is the URL that should be indexed. That’s it.

Making It Dynamic

Because we’re dictating what URL should be used, the <txp:permlink /> tag will work perfectly for individual articles, because it generates a URL based on the Permanent link mode setting in Textpattern’s preferences, not what URL was used to access the page. This would be our code:

<link rel="canonical" href="<txp:permlink />" />

Yes, it’s that simple.

Sections and Categories

The above tag works great for individual articles, but section and category pages require manual URL reconstruction. (Keep in mind the <txp:site_url /> tag automatically creates a trailing slash so there’s no need to add one directly after.)

Sections in Messy Mode

<link rel="canonical" href="<txp:site_url />index.php?s=<txp:section />" />

Sections in Clean URL Mode

<link rel="canonical" href="<txp:site_url /><txp:section />" />

Categories in Messy Mode

<link rel="canonical" href="<txp:site_url />index.php?c=<txp:category />" />

Categories in Clean URL Mode

<link rel="canonical" href="<txp:site_url />category/<txp:category />" />

Wrapping It Together

The canonical URL link tag belongs in the metadata, so I suggest integrating it with whatever model you are using for custom titles and descriptions. This usually involves conditional tags. For instance, if you were running a simple navigation structure with sections and articles, this could be used in a page template:

   <link rel="canonical" href="<txp:permlink />" />
<txp:else />
   <link rel="canonical" href="<txp:site_url /><txp:section />" />

The conditionals can get complicated quickly for URL scenarios that are not native to Textpattern, especially ones that require manual URL construction. For instance, a site using the tru_tags plugin will require additional work for the actual tag pages:

<txp:if_section name="tag">
   <link rel="canonical" href="<txp:site_url />tag/<txp:tru_tags_tag_parameter />" />

Also, the plugin gbp_permanent_links does not always play nice with <txp:permlink />, so manual URL reconstruction may be necessary. For instance, something like site.com/section/category/article-title is not “out of the box” functionality, so the canonical link tag might look like this:

<link rel="canonical" href="<txp:site_url /><txp:section />/<txp:category />/<txp:article_url_title />" />

Worth the Effort?

Yes. History has routinely shown us that by the time Google publicly unveils a new technological aspect of their ranking algorithm, it’s already been in practice for some time, and web developers are wise to adhere to the recommended best practice. Yes it’s one one more tag to worry about, but it’s simplicity makes it almost effortless to integrate into any Textpattern environment.

comments powered by Disqus