Duplicate Content Issues and How to Solve Them

Let’s talk about the issue of Duplicate Content. It’s estimated that duplicate content makes up about 29% of the web, according to a study by Raven.

You may be thinking “Well, I know plagiarism is wrong and I don’t copy someone else’s content.” However, duplicate content is unintentionally created in most cases. The most common scenario is when a topic is written about more than once by the same company or organization. Another instance when duplicate content occurs is when one webpage can be accessed by multiple URLs. Highly similar content is viewed as duplicate content by Google and greatly affects SEO ranking. Content that has duplicates is unlikely to ever rank as the top search result even when searched for with relevant keywords.

Why is Duplicate Content Bad?

While having similar versions of a post may not seem overly offensive to a human, we have to remember that it’s not only humans that visit our site. Humans and search engine bots view and treat content in different ways. As we’ve discussed in our other articles on SEO, how well our content ranks depends on how we cater to both types of visitors.

When our bot visitors come across duplicate content, it tries to identify which is the “master copy” among the duplicates. When it becomes confused, it may choose the wrong post (not the one you wanted to rank) because your similar posts are competing with each other.

humans and bots view duplicate content differently

What constitutes duplicate content to humans and bots?

Take the scenario where there are multiple blog posts on the same topic for example. A human reading two highly similar posts may see that there are similarities and differences in each post. However, to a bot, two highly similar posts are equivalent to two of the same post.

In the other scenario where a website has different variations on the URL, bots count each variation as a separate entity. Take a look at this example given by Moz in the article Canonicalization:

http://www.marketing.com
https://www.marketing.com
http://marketing.com
http://marketing.com/index.php
http://marketing.com/index.php?r…

In this example, the URLs above would take us to the same location and as a human, we would likely consider this as one destination, one page. However, a search engine bot would consider this as five individual pages of duplicate content. Another problematic situation arises with product pages for e-commerce websites, where each variation of a product appears to be duplicate content.

When all of this “duplicate” content compete against each other for search engine ranking, this becomes a serious problem.

How Do Duplicate Content Issues Affect SEO?

Search engine bots become confused when there is too much content that appears the same. In the article Canonical Tags, Moz states some of the ways that duplicate content affects SEO:

Dilution of Ranking Ability:

Our visibility and ability to rank could become dispersed throughout the multiple page versions instead of being focused on the original page

Reduced Visibility of Unique Content:

If there is too much information for bots to crawl, some of our unique content may get passed over.

Displaying The Wrong URL:

Search engine bots may choose the wrong URL to display on searches when we don’t specify what to do with duplicate content.

How Do I Fix Duplicate Content Issues?

One of the ways that some have attempted to fix the duplicate content problem is by giving bots instructions in the robots.txt file not to crawl a page. Although it might sound reasonable, this is not an effective solution.

Canonicalization is the Key to Managing Duplicate Content Issues

What is canonicalization?
“Fixing duplicate content issues all comes down to the same central idea: specifying which of the duplicates is the “correct” one.” – Duplicate Content by Moz

Identifying the canonical version is equivalent to telling search engine bots that a specific page or URL is the one that should be indexed and ranked. There are different approaches to canonicalizing duplicate content. The two methods we are going to cover is the 301 Redirect and the Rel Canonical Tag. Each method has different uses and benefits.

Two Ways to Canonicalize:

Canonicalize by Redirecting Humans and Bots: The 301 Redirect

With the 301 Redirect, we are identifying the “master copy” by redirecting both humans and bots from a duplicate page to the “master copy”- the canonical page.

This method is best used in a situation where it is not necessary for duplicates to be accessible, such as when multiple URLs lead to the same destination. The duplicates become buried and can no longer be viewed by human and bot visitors. Moz states in the article Redirects that 90-99% of the rank power is transferred from the duplicate to the redirected page when using 301 redirect.

For more information on how to set up a 301 redirect, check out this article from Google: Change page URLs with 301 redirects

deal with duplicate content issues by redirecting

fix duplicate content issues with the rel canonical tag

Canonicalize By Telling Bots What to Do: The Canonical Tag aka Rel Canonical

As we mentioned above, others have tried unsuccessfully to deal with the duplicate content issue by telling bots not to crawl duplicates in a robots.txt file. However, there is a correct way to give instructions to our bot visitors concerning duplicate content. This method utilizes the canonical tag, or rel canonical.

The canonical tag is an html element that we use within the duplicate (and master copy) page to designate the canonical- the one we want to index and rank.

When to Use Rel Canonical
The Rel Canonical Tag is especially useful in the situation we listed previously where you have multiple blog posts on the same topic. Since you still want the duplicate blog posts to be accessible to visitors, you would use the Rel Canonical tag on each duplicate to direct rank power and indexing to the page you designate as the canonical. Rel Canonical can even be used across different domains. For example, if you own two websites and have a similar blog post on each site, you can canonicalize the blog post from one website to the other.

Additionally, Hubspot lists in the article Canonicalization 101 some other common situations to use Rel Canonical:

  • Product Variation Pages
  • Mobile Specific URLs & Subdomain
  • Region/Country Specific URLs (if page content is in same language)

How to Use Rel Canonical

  1. Identify which of the duplicate pages is the canonical
  2. On the duplicate pages, the canonical tag should point to the URL of the canonical page
  3. On the master copy, the canonical tag should be self-referential- pointing back to itself with its own URL as the canonical.
  4. The tag should look like this, with the absolute URL** of the canonical page replacing where it says http://www.example.com:

<link rel=”canonical” href=”http://www.example.com” />

**Absolute URL: an absolute URL is when the full path is specified, whereas a relative URL is only partially specified. The absolute and relative URL of the page SEO within the folder News might look like this:

Absolute: “http://www.example.com/news/seo

Relative: “/news/seo”

The canonical tag should be used in the <head> section within the html of the page. Correct placement of any html element including the canonical tag is important for a website to function properly. Improper placement of an html tag can cause serious problems. If you don’t have the html knowledge to manually input the canonical tag, Yoast explains how to set the canonical URL by using the Yoast SEO plugin in the article rel=canonical: the ultimate guide.

Additional Tips to Canonicalize Duplicate Content:

  • Canonicalize Exact and Very-Near Duplicates: Remember, bots see very near duplicates as exact duplicates
  • Aggregate Unique Content: Consolidate rank power of two similar posts by moving unique content from the duplicate to the canonical post and using the canonical tag accordingly.
  • Don’t Be Confusing and Give Clear Directions
    • Specify no more than one rel=canonical for a page. When more than one is specified, all rel=canonicals will be ignored.
    • Don’t use the canonical tag in conjunction with 301 redirects in the same set of duplicate content. For example, don’t use the canonical tag on post 1 to canonicalize post 2, then use 301 redirects on post 2 to redirect back to post 1. (Very confusing.)
  • Make Sure Your Target Page Exists and Is Not:
    • A 404 Error (Page Not Found)
    • Blocked by robots.txt
    • Set to “noindex” (blocked from search results)
  • Check Auto-Populated Tags: Some CMS, SEO Plugins and E-commerce systems will auto-populate canonical tags. Be sure to check that these tags point to where you actually want them to.

There are a variety of methods to manage duplicate content and we’ve covered the two main ones in this post. The best method that D4 uses to combat duplicate content from multiple posts on the same topic is simply to not create multiple posts. In creating a single post with unique content that exhaustively covers a single topic, the rank power is already consolidated in that one post. When there is new information available on a topic that has already been covered, it can be more advantageous to update the existing post rather than create a new post that may compete for rank power.

Duplicate content is a complicated issue that can have significant impact on SEO. D4 can help with managing duplicate content issues and improving SEO for your website. If you need help creating and managing engaging content, call us at 775-636-9986!