How to Handle Unavoidable Duplicate Content

Duplicate Content. Just uttering the word around webmasters, SEOs, and business owners can bring jagged stares and send chills down their spines. The type of content has long been blamed for a myriad of website problems including losing large swaths of Google traffic and even revenue.

Navigate This Article:

The way Google handles duplicate content has shifted over the years, but this article is about a very specific type of duplicate content – Unavoidable Duplicate Content.

What is Unavoidable Duplicate Content?

This is a type of content that you are unable to control, edit, or change either due to legal reasons such as a contract or due to the way your CMS operates.

For example, consider that you are using a CMS that automatically imports furniture product images, descriptions, and pricing from a manufacturer you work with. The CMS turns your small local furniture store into a massive online ecommerce store over night filling it with 100s of pieces of furniture for your customers to choose from, more than you could ever fit in your show room. There is only one problem, the CMS you are using is required by the manufacturer and they only allow you to change the price of the products listed.

This wouldn’t be a problem except for any business person would quickly be able to see that our furniture manufacturer is likely doing this for hundreds of stores around the country. Now, we have an issue. There are hundreds of stores listing the exact same products with the exact same names, descriptions, and photos – and none of this can be edited, changed, adjusted, or added to.

This is Unavoidable Duplicate Content.

How is it Different From Other Duplicate Content?

Duplicate content is duplicate content. I am doubtful Google or other engines have devised ways to handle certain types of it different from other types. However, as a business owner (or a fellow marketer / SEO reading this) you must be aware that not all duplicate content is the same on our end of things.

Most often when talking heads discuss duplicate content in blog posts or videos they are talking to bloggers, publishers, or others who might be afraid of those who consider quickly copying content from other sites as a way to make money. Their advice, and that from search engines, is almost always unanimous something like “Don’t worry about duplicate content, because Google can identify the content originator”.

This is almost always true (and sometimes agonizingly not true).

They might even go so far as to tell you that there is no such thing as a ‘duplicate content penalty’ and that duplicate content doesn’t really hurt your website.

This is also generally true, until of course it is not and the duplicate content on your site does cause issues.

But, this is all of little help to a business being burdened with Unavoidable Duplicate Content since the discussions are about how it might be handled by an engine and now how it is likely impacting the business.

How Unavoidable Duplicate Content Can Cause SEO Issues

Google and their employees speak in ways that are often vague and leave things open to interpretation. This is likely not an attempt at harming SEOs or website owners, but more of an attempt and limiting any legal rammifications from their speaking and of course to keep from giving their secret recipe to competitors and spammers. However, in recent years, they have been quite blunt about several SEO topics including specific topics within the realm of duplicate content issues, giving us some extremely clear information.

When we discuss potential negative impacts of duplicate content we always have to keep in mind that we do not have the algorithm in front of us and we are often speculating based on Google’s statements, research, and results. Just because Google has not said something plainly does not mean it is not a true statement, and vice versa.

Below I list 3 major SEO issues you might notice from having an unavoidable duplicate content problem and identify if these are real or specualation and if our own internal research has confirmed them or not as a possible issue caused by this very specific type of duplicate content.

1. The Duplicate Content Indexing Filter

  • Real or Speculation = Real
  • Research Confirmed = Yes

Google discusses this in their official documentation on Canonicals which states: “Google can only index the canonical URL from a set of duplicate pages.”

This was not the original definition, it was updated in early-2022 to this. Prior to the change many SEOs believed that duplicates would be indexed but that Google would select 1 to rank above the others. In the case of Unavoidable Duplicate Content this meant one website out of the multitudes would win.

Essentially, this means that Google may not index your content if the content has zero originality and a better website or originator site has the exact same content indexed. Of course, you must be indexed by a search engine to rank in their results.

2. Reduced Crawl Rate / Allowance

  • Real or Speculation = Speculation
  • Research Confirmed = Yes

A large volume of duplicate content on a website is thought by some to reduce the amount Google or other engines decide to crawl your website. This makes sense from a productivity and expenses standpoint when you understand how expensive at scale running a quality search engine is.

If every time a search engine’s crawler hits your website it brings back mostly content that is duplicated from somewhere else and their system refuses to index it, then why would they keep spending time and money to crawl your site? To determine a piece of content is duplicate and to refuse to index it the system has to spend computational resources, they also have to spend computational resources to crawl. Across billions of websites and trillions of documents even extremely small expenses add up quickly.

A search engine could save money without sacrificing quality by simply cutting the crawl rate in half or more for websites found to be nothing but duplicate content.

3. Reduced Rankings Due to Other SEO Factors

  • Real or Speculation = Speculation
  • Research Confirmed = Yes

Instead of a duplicate content penalty (i.e. a direct algorithmic impact caused by your duplicate content) which Google has clearly and plainly stated they do not use, it is plausible that Google and other search engines might reduce the rankings of a site where there is a lot of duplicated content through standard operations of their algorithm not necessarily as a focus, but more as a side effect. This isn’t something an engineer siting at Google determines or even a negative weighting placed in their algorithm, but is due to the fact that the duplicate content itself is not indexed and therefore is invisible to the ranking engine and not used in determining ranking factors.

Assume for a moment that 50% of all products in one of your categories have Unavoidable Duplicate Content on their pages with nothing that helps them stand out so a search engine decides not to index these pages. If they were in the index your site would have the widest selection of these products and therefore would be extremely useful to searchers (i.e. a site the engine really wants to rank), however, with a 50% reduction of these pages your site has one of the smallest selections. It is plausible this represents a poor user experience to the engine and they opt to include at the top of their rankings only sites with a larger selection. Here we see that the Unavoidable Duplicate Content has caused an indirect SEO impact by not allowing the content to be indexed.

There could be other factors as well. Take our example above and assume that our site has identical SEO metrics to their largest competitors and on their product pages they internally link to a hero product that has the highest profit margins. This might give them an edge if these pages were indexed, but since they are not in the index there are no SEO values calculated by the engine and no values (i.e. PageRank) distributed via the internal links to the hero product providing a downward pull on their rankings for the product’s keywords.

How to Handle Unavoidable Duplicate Content

1. Avoid it

The most simplistic way to handle this problem is also the most obvious – don’t use Unavoidable Duplicate Content if you can’t make edits or changes. Of course, if you’re reading this you or I have probably determined that you are unable to avoid this duplicate content for some reason.

While we refer to this type of duplicate content as “Unavoidable” it is often times quite avoidable. In one case after an SEO Review we spoke to the primary stakeholders and explained the issue, they spoke to their represenatives at the company supplying their products and their website content and it was determined that in fact they had the choice to have a CMS that allowed editing and one that did not. The client chose the one that did not when they started because it was sligthly less expensive and they perceived no benefit from having the ability to edit. This was a simple fix in theory, but required a new site launch in practice and a lot of fresh content.

2. Add unique written content

If you’re being forced to use Unavoidable Duplicate Content on parts of your site (or the whole thing) the best option you have available is to add unique text content to the pages with this duplicate content. For example if this content is coming from a manufacturer and you have to maintain the main product description, see if your CMS will allow you to add a unique description below it.

3. Allow UGC content

Another great way to make your pages more unique is to get UGC content on the page. There is no guarantee that an engine will use this to determine the page is unique, but they might. Even if the engine doesn’t determine the page is unique, it could make your duplicate page the most valuable out of all of the other pages on the web being forced to use this Unavoidable Duplicate Content.

When gathering UGC content make sure you have some internal editorial review or oversight capacity. For example ensuring that ecommerce reviewers actually purchased the product or that a comment is genuine instead of spamming about a pharmaceutical or adult videos.

4. Add unique image or video content

Adding your own image or video content to a page can do wonders to improve the user experience and make engines see your page as more unique. As with UGC content this may not get the duplicate label removed from your page, but it might make your page the best one in the cohort, winning the canonicalization from engines and the traffic / sales that comes with it.

5. Gain more links to your content

Somehow in SEO it always comes back to links. Gaining high-quality inbound links to your pages with Unavoidable Duplicate Content can help engines like Google determine that your page is the best out of all of the others with the same duplicate content.

The key part here is “high-quality”. That does not mean ordering link spam on Fiverr or buying links from a link selling service.

6. Add more internal links to your content

Gaining internal links is often an overlooked part of this and is proven to help your page get selected as the canonical when all others in the cohort are similar.

For example in one case we built a ‘recent listings’ page for a real estate agent where the IDX used by them and all of their competitors filled the websites with Unavoidable Duplicate Content that they could not edit or add to. By building a page to show the most recent (and desirable) listings we were able to increase the indexing of the higher value properties from around 40% to over 85% leading to our client winning more traffic and closing more deals.

7. Adjust your title tags to be better than others

When I spoke with Gary Illyves about this specific type of duplicate content he suggested that it could be as simple as adjusting title tags to be seen as the canonical. A group of pages where the title tag is frequently of higher quality might help Google determine to index more of those pages versus the others with the same content. Honestly, this would be amazing if it worked and most CMSes allow changes here, start experimenting with new title tags on pages that are not being indexed and see if that improves things.

8. Adjust your meta descriptions to be better than others

I am speculating here, but having a meta description or a boilerplate that gains more clicks than others in the cohort might help a page or group of pages be selected as the canonical option for the Unavoidable Duplicate Content.

9. Noindex all of it

Instead of trying to fight the uphill battle, if part of your website is Unavoidable Duplicate Content, then see if you can noindex those pages to allow the rest of the site to be crawled more frequently and possibly rank better. While this does not solve the issue of getting organic search traffic to these pages and almost completely eliminates any SEO value coming from them, it also stops any potential harm they are causing. This is our recommended approach for syndicated content if nothing was added to the article (even something as simple as an editorial statement).

10. Give up and focus on blogging

There is a chance that all of your efforts to get around this issue will be futile. Your website is, afterall, jam packed with Unavoidable Duplicate Content and likely struggling to get pages indexed and ranking. If this is the case, then start blogging as the key component of your SEO strategy. Instead of trying to rank your pages filled with duplicate content, put a blog on a subdomain like blog.yourwebsite.com and start blogging about topics related to your products/services or relevant to your target audeince.

Start ranking these blog posts instead and place calls to action inside of them to help drive sales/lead generation.

These blog posts also make really nice link sources for your pages with duplicate content and might eventually help you overcome this problem.

Understanding Terms used in this Document

Real or Speculation – Identifies if an impact is known to be real, likely backed by admissions from Google or other major search engines.

Research Confirmed – Identifies if our research here has confirmed this to be true or likely to be true.

Duplicate Content – Content, usually text-based written content, that is an identical match or ‘spun’ from an identical matching piece of content.

Canonical – The standard or primary version of a piece of content, the authoritative source.

References

https://web.archive.org/web/20220615000000*/https://support.google.com/webmasters/answer/10347851?hl=en
https://developers.google.com/search/docs/crawling-indexing/consolidate-duplicate-urls

Joe Youngblood

view all posts

Joe Youngblood is a top Dallas SEO, Digital Marketer, and Marketing Theorist. When he's not working with clients or writing about marketing he spends time supporting local non-profits and taking his dogs to various parks.

0COMMENTS Join the Conversation →