Duplicate Content: Why does it happen and how to fix issues (2023)

What is duplicate content?

Duplicate content is content that appears on the Internet in more than one place. That “one place” is defined as a location with a unique website address (URL) - so, if the same content appears at more than one web address, you’ve got duplicate content.

While not technically a penalty, duplicate content can still sometimes impact search engine rankings. When there are multiple pieces of, as Google calls it, "appreciably similar" content in more than one location on the Internet, it can be difficult for search engines to decide which version is more relevant to a given search query.

Why does duplicate content matter?

For search engines

Duplicate content can present three main issues for search engines:

  1. They don't know which version(s) to include/exclude from their indices.

  2. They don't know whether to direct the link metrics (trust, authority, anchor text, link equity, etc.) to one page, or keep it separated between multiple versions.

  3. They don't know which version(s) to rank for query results.

For site owners

When duplicate content is present, site owners can suffer rankings and traffic losses. These losses often stem from two main problems:

  1. To provide the best search experience, search engines will rarely show multiple versions of the same content, and thus are forced to choose which version is most likely to be the best result. This dilutes the visibility of each of the duplicates.

    (Video) How To Fix Duplicate Content Issues - Quick SEO Tutorial

  2. Link equity can be further diluted because other sites have to choose between the duplicates as well. instead of all inbound links pointing to one piece of content, they link to multiple pieces, spreading the link equity among the duplicates. Because inbound links are a ranking factor, this can then impact the search visibility of a piece of content.

The net result? A piece of content doesn't achieve the search visibility it otherwise would.

Duplicate Content: Why does it happen and how to fix issues (1)

How do duplicate content issues happen?

In the vast majority of cases, website owners don't intentionally create duplicate content. But, that doesn't mean it's not out there. In fact by some estimates, up to 29% of the web is actually duplicate content!

Let's take a look at some of the most common ways duplicate content is unintentionally created:

1. URL variations

URL parameters, such as click tracking and some analytics code, can cause duplicate content issues. This can be a problem caused not only by the parameters themselves, but also the order in which those parameters appear in the URL itself.

For example:

Similarly, session IDs are a common duplicate content creator. This occurs when each user that visits a website is assigned a different session ID that is stored in the URL.

Duplicate Content: Why does it happen and how to fix issues (2)

Printer-friendly versions of content can also cause duplicate content issues when multiple versions of the pages get indexed.

(Video) ⚠️ Website duplicate content: why it occurs and how to fix it

Duplicate Content: Why does it happen and how to fix issues (3)

One lesson here is that when possible, it's often beneficial to avoid adding URL parameters or alternate versions of URLs (the information those contain can usually be passed through scripts).

2. HTTP vs. HTTPS or WWW vs. non-WWW pages

If your site has separate versions at "www.site.com" and "site.com" (with and without the "www" prefix), and the same content lives at both versions, you've effectively created duplicates of each of those pages. The same applies to sites that maintain versions at both http:// and https://. If both versions of a page are live and visible to search engines, you may run into a duplicate content issue.

3. Scraped or copied content

Content includes not only blog posts or editorial content, but also product information pages. Scrapers republishing your blog content on their own sites may be a more familiar source of duplicate content, but there's a common problem for e-commerce sites, as well: product information. If many different websites sell the same items, and they all use the manufacturer's descriptions of those items, identical content winds up in multiple locations across the web.

How to fix duplicate content issues

Fixing duplicate content issues all comes down to the same central idea: specifying which of the duplicates is the "correct" one.

Whenever content on a site can be found at multiple URLs, it should be canonicalized for search engines. Let's go over the three main ways to do this: Using a 301 redirect to the correct URL, the rel=canonical attribute, or using the parameter handling tool in Google Search Console.

301 redirect

In many cases, the best way to combat duplicate content is to set up a 301 redirect from the "duplicate" page to the original content page.

When multiple pages with the potential to rank well are combined into a single page, they not only stop competing with one another; they also create a stronger relevancy and popularity signal overall. This will positively impact the "correct" page's ability to rank well.

Duplicate Content: Why does it happen and how to fix issues (4)

Rel="canonical"

Another option for dealing with duplicate content is to use the rel=canonical attribute. This tells search engines that a given page should be treated as though it were a copy of a specified URL, and all of the links, content metrics, and "ranking power" that search engines apply to this page should actually be credited to the specified URL.

(Video) Would A 301 Redirect Fix Websites With Duplicate Content Issues?

Duplicate Content: Why does it happen and how to fix issues (5)

The rel="canonical" attribute is part of the HTML head of a web page and looks like this:

General format:

<head>...[other code that might be in your document's HTML head]...<link href="URL OF ORIGINAL PAGE" rel="canonical" />...[other code that might be in your document's HTML head]...</head>

The rel=canonical attribute should be added to the HTML head of each duplicate version of a page, with the "URL OF ORIGINAL PAGE" portion above replaced by a link to the original (canonical) page. (Make sure you keep the quotation marks.) The attribute passes roughly the same amount of link equity (ranking power) as a 301 redirect, and, because it's implemented at the page (instead of server) level, often takes less development time to implement.

Below is an example of what a canonical attribute looks like in action:

Duplicate Content: Why does it happen and how to fix issues (6)

Using MozBar to identify canonical attributes.

Here, we can see BuzzFeed is using the rel=canonical attributes to accommodate their use of URL parameters (in this case, click tracking). Although this page is accessible by two URLs, the rel=canonical attribute ensures that all link equity and content metrics are awarded to the original page (/no-one-does-this-anymore).

Meta Robots Noindex

One meta tag that can be particularly useful in dealing with duplicate content is meta robots, when used with the values "noindex, follow." Commonly called Meta Noindex,Follow and technically known as content=”noindex,follow” this meta robots tag can be added to the HTML head of each individual page that should be excluded from a search engine's index.

General format:

(Video) #WhiteboardFriday: How to Resolve Duplicate Content

<head>...[other code that might be in your document's HTML head]...<meta name="robots" content="noindex,follow">...[other code that might be in your document's HTML head]...</head>

The meta robots tag allows search engines to crawl the links on a page but keeps them from including those links in their indices. It's important that the duplicate page can still be crawled, even though you're telling Google not to index it, because Google explicitly cautions against restricting crawl access to duplicate content on your website. (Search engines like to be able to see everything in case you've made an error in your code. It allows them to make a [likely automated] "judgment call" in otherwise ambiguous situations.)

Using meta robots is a particularly good solution for duplicate content issues related to pagination.

Preferred domain and parameter handling in Google Search Console

Google Search Console allows you to set the preferred domain of your site (i.e. http://yoursite.com instead of http://www.yoursite.com) and specify whether Googlebot should crawl various URL parameters differently (parameter handling).

Duplicate Content: Why does it happen and how to fix issues (7)

Depending on your URL structure and the cause of your duplicate content issues, setting up either your preferred domain or parameter handling (or both!) may provide a solution.

The main drawback to using parameter handling as your primary method for dealing with duplicate content is that the changes you make only work for Google. Any rules put in place using Google Search Console will not affect how Bing or any other search engine's crawlers interpret your site; you'll need to use the webmaster tools for other search engines in addition to adjusting the settings in Search Console.

Additional methods for dealing with duplicate content

  1. Maintain consistency when linking internally throughout a website. For example, if a webmaster determines that the canonical version of a domain is www.example.com/, then all internal links should go to http://www.example.co... rather than http://example.com/pa... (notice the absence of www).

  2. When syndicating content, make sure the syndicating website adds a link back to the original content and not a variation on the URL. (Check out our Whiteboard Friday episode on dealing with duplicate content for more information.)

  3. To add an extra safeguard against content scrapers stealing SEO credit for your content, it's wise to add a self-referential rel=canonical link to your existing pages. This is a canonical attribute that points to the URL it's already on, the point being to thwart the efforts of some scrapers.

    (Video) How To Solve Duplicate Content Issues
    Duplicate Content: Why does it happen and how to fix issues (8)
    A self-referential rel=canonical link: The URL specified in the rel=canonical tag is the same as the current page URL.

    While not all scrapers will port over the full HTML code of their source material, some will. For those that do, the self-referential rel=canonical tag will ensure your site's version gets credit as the "original" piece of content.

Keep learning

Put your skills to work

Moz Pro's site crawl can help identify duplicate content on a website. Try it >>

FAQs

What causes duplicate content? ›

Common causes of duplicate content. Duplicate content is often due to an incorrectly set up web server or website. These occurrences are technical in nature and will likely never result in a Google penalty. They can seriously harm your rankings though, so it's important to make it a priority to fix them.

Why is having duplicate content and issue for SEO? ›

Too much duplicate content within a website (or on the web, in general) can confuse search engines, and the wrong page does occasionally rank over the right one. This can lead to SERP results that aren't as accurate as they should be, which can frustrate users, hurt your traffic, and raise bounce rates.

How do you best describe duplicate content? ›

Duplicate content is the same content that appears on 2 or more unique URLs. Same content is defined as blocks of content that is "appreciably similar" which can range from exact copies to content that contains chunks of copied text.

What is the control for duplicate? ›

If you want to duplicate something you can copy (CTRL+C) and paste (CTRL+V). But there's a quicker way that only requires one shortcut: CTRL+D.

How can we prevent duplicate defects? ›

Unavailability of mail configuration feature Mail configuration feature helps the team in identifying the defects logged by other team members, so that duplication can be avoided Unavailability of advanced search feature Advanced search feature like searching for duplicates based on comments can be used to avoid ...

How do you prevent duplicate content on product pages? ›

Product Review Pages

To avoid this duplicate content, these “review pages” should either be canonicalized to the main product page or set to “noindex,follow” via meta robots or X-robots tag. We recommend the first method, just in case a link to a “review page” occurs on an external website.

What are the content issues? ›

These issues relate to your content and how Google views it. Duplicate, thin, and slow content, for example, can negatively impact rankings. Content issues vary in severity, but resolving them may improve your site's ability to rank.

Why should you avoid duplicate content? ›

The Impact of Duplicate Content

Key pages unexpectedly not performing well in SERPs or experiencing indexing problems. Fluctuations or decreases in core site metrics (traffic, rank positions, or E-A-T criteria) Other unexpected actions by search engines as a result of confusing prioritization signals.

How do I remove duplicate content from my website? ›

Here are the three ways I recommend.
  1. The rel=canonical tag. In most cases, the rel=canonical tag is the best way to transfer SEO juice from a duplicate page to another page. ...
  2. 301 redirect. Sometimes, you don't want the duplicate of your web page to stick around. ...
  3. Set passive parameters in Google Search Console.

How does SEO handle duplicate content? ›

Restructuring a link format too can create multiple copies of the same content. To reduce the impact of such duplicate content issues, set up 301 redirects. 301 redirects from the non-preferred URLs of a resource to their preferred URLs are a great way to alert the search engines about your preference.

How does Google find duplicate content? ›

Using Google to check for Duplicate Content

One quick way to check if a page may be considered duplicate is by copying around ten words from the start of a sentence and then pasting it with quotes into Google. This is actually Google's recommended way to check.

What is an example of duplicate? ›

Adjective I began receiving duplicate copies of the magazine every month. I had a duplicate key made.

What is duplicate content analysis? ›

A duplicate content checker works on the same principle as the Google search engine to identify duplicates on the web. It takes a random passage from the copy of a web page and checks whether this page content or similar text already exists on the web. The Google Index is used for this purpose.

Does duplicate content matter? ›

Duplicate content confuses Google and forces the search engine to choose which of the identical pages it should rank in the top results. Regardless of who produced the content, there is a high possibility that the original page will not be the one chosen for the top search results.

How do you remove duplicates? ›

Remove duplicate values
  1. Select the range of cells that has duplicate values you want to remove. Tip: Remove any outlines or subtotals from your data before trying to remove duplicates.
  2. Click Data > Remove Duplicates, and then Under Columns, check or uncheck the columns where you want to remove the duplicates. ...
  3. Click OK.

What is duplicate the problem? ›

The gold standard for troubleshooting is duplication of the problem. Simply put, duplication is the ability to reliably recreate a particular failure condition.

Why remove duplicate is not working? ›

Be careful when using the Remove Duplicates tool in Excel. Remove duplicates DOES NOT work if number formats are different. It is very literal in that even if the format is different it will assume it is not a duplicate. So the numbers 10 and 10.00 will be seen as unique and not deleted.

How do you avoid duplication in a research proposal? ›

One of the best practices for research labs is to implement an Electronic Lab Notebook (ELN)". To prevent duplication and plagiarism in scientific publishing, various detecting tools have been widely applied, such as CrossCheck, Turnitin, I-thenticate, and many other freely available plagiarism detection softwares.

What is SEO issue? ›

Anything that slows Google's discovery or crawling of the Web is unwelcome. Dynamically created websites that create Web pages on the fly from databases are frequently misconfigured from an SEO point of view. These sites may create lots of pages, or URLs, that contain basically the same content, over and over.”

Why duplicate data is a problem? ›

Good reporting requires accurate data that is free of duplicates. Duplicate data inhibits this. Reports generated from duplicate records are less reliable and cannot be used to make informed decisions. The business will also find it difficult to forecast what it should do for future growth.

Why is it important to remove duplicate data? ›

Datasets that contain duplicates may contaminate the training data with the test data or vice versa. Entries with missing values will lead models to misunderstand features, and outliers will undermine the training process – leading your model to “learn” patterns that do not exist in reality.

Is all duplicate content Bad? ›

What Harm Can Duplicate Content Cause? External duplicate content, if created intentionally, can't cause any harm. Still, you must identify which version of your content is the original, as that will be the version that gets indexed. Internal content duplication can cause Google to link to the wrong page on your site.

What happens if you copy content from another website? ›

At first glance, it may seem as if it's perfectly legal to copy content from a website. But is it? The short answer to this question is "no," unless you've obtained the author's permission. In fact, virtually all digital content enjoys the same copyright protections as non-digital, "offline" content.

How do I find duplicates on a website? ›

Use the free CopyScape and/or PlagSpotter tools – Type your homepage URL (www.website.com) into the search bar and click “Go.” Each of these tools will search the web for any page that contains content identical to the content of any page on your website. The list of results includes links to each duplicate page.

How do I delete duplicates in Google? ›

The easiest way to remove duplicates in Google Sheets

You don't need to manually delete each duplicate row, since Google Sheets can do this for you with literally two steps: Go to the Data menu and select Remove duplicates. Select the columns to analyze for duplicates and click Remove duplicates.

How much of the Internet is duplicate content? ›

“It's important to realize that if you look at content on the web, something like 25% or 30% of all the web's content is duplicate content.

How much duplicate content is acceptable? ›

How Much Duplicate Content is Acceptable? According to Matt Cutts, 25% to 30% of the web consists of duplicate content. According to him, Google doesn't consider duplicate content as spam, and it doesn't lead your site to be penalized unless it is intended to manipulate the search results.

How many types of duplication are there? ›

Gene duplication can occur by several mechanisms, including whole-genome duplication (WGD) and single gene duplication. Single gene duplication includes four types, tandem (TD), proximal (PD), retrotransposed (RD), DNA-transposed (DD) and dispersed duplication (DSD) (Freeling, 2009; Hahn, 2009; Wang et al., 2012b).

What is an example of duplication error? ›

Error of duplication is when an accounting entry is duplicated, meaning it's debited or credited twice for the same entry. For example, an expense was debited twice for the same amount would be an error of duplication.

Which keyword is used to remove duplicate? ›

Answer: The DISTINCT keyword is used to remove the duplicate rows in a table.

What is meant by duplicate content? ›

What is duplicate content? Duplicate content is content that appears on the Internet in more than one place. That “one place” is defined as a location with a unique website address (URL) - so, if the same content appears at more than one web address, you've got duplicate content.

What does duplicate mean in research? ›

Duplication represents inadvertent, unconscious, or, more rarely, deliberate repetition of research efforts, thus not serving a need to confirm or otherwise verify conclusions from previous research undertakings.

What is considered duplicate? ›

Duplicate content is material that appears in the same form in two or more different places. It can arise for a variety of reasons. For example, poor site architecture - where a site accidentally creates multiple copies of certain pages - can result in identical content.

How do I stop duplicate content on YouTube? ›

But in the end, what matters most to YouTube is adding value. So, as long as you are contributing value (be it by commentary, educational value or high-quality editing) to the original video, your video will not be flagged as duplicate content and endanger your channel's monetization.

How do you prevent duplicate data? ›

Remove duplicate values
  1. Select the range of cells that has duplicate values you want to remove. Tip: Remove any outlines or subtotals from your data before trying to remove duplicates.
  2. Click Data > Remove Duplicates, and then Under Columns, check or uncheck the columns where you want to remove the duplicates. ...
  3. Click OK.

How do I remove duplicate content from my website? ›

Here are the three ways I recommend.
  1. The rel=canonical tag. In most cases, the rel=canonical tag is the best way to transfer SEO juice from a duplicate page to another page. ...
  2. 301 redirect. Sometimes, you don't want the duplicate of your web page to stick around. ...
  3. Set passive parameters in Google Search Console.

How do you prevent duplicate content on product pages? ›

Product Review Pages

To avoid this duplicate content, these “review pages” should either be canonicalized to the main product page or set to “noindex,follow” via meta robots or X-robots tag. We recommend the first method, just in case a link to a “review page” occurs on an external website.

What is the problem of duplication? ›

Problem duplication is when content copies either 1) confuse using your information or 2) obscure the ideal content. Don't be too narrow in thinking about duplication — *exact* copies are not as serious as other kinds of copies.

Why you should avoid duplicate content? ›

Duplicate content doesn't make it easy for search engines either. The largest problems search engines face with duplicate content are: Search engines don't know which pages of the website to include and exclude in their indices. Search engines don't know which pages to rank for search query results.

How do you avoid duplication in a research proposal? ›

One of the best practices for research labs is to implement an Electronic Lab Notebook (ELN)". To prevent duplication and plagiarism in scientific publishing, various detecting tools have been widely applied, such as CrossCheck, Turnitin, I-thenticate, and many other freely available plagiarism detection softwares.

Why is it important to remove duplicate data? ›

Datasets that contain duplicates may contaminate the training data with the test data or vice versa. Entries with missing values will lead models to misunderstand features, and outliers will undermine the training process – leading your model to “learn” patterns that do not exist in reality.

How do I delete a duplicate? ›

Delete duplicate files
  1. On your Android device, open Files by Google .
  2. At the bottom, tap Clean .
  3. On the “Duplicate files” card, tap Select files.
  4. Select the files you want to delete.
  5. At the bottom, tap Delete .
  6. On the confirmation dialog, tap Delete .

How does Google identify duplicate content? ›

Google uses a predictive method to detect duplicate content based on URL patterns, which could lead to pages being incorrectly identified as duplicates. In order to prevent unnecessary crawling and indexing, Google tries to predict when pages may contain similar or duplicate content based on their URLs.

How do I get rid of duplicate content in SEO? ›

To correct indexed duplicate content, consolidate link authority into a single page and prompt the search engines to remove the duplicate version from their index.

Why you should avoid duplicate content on your website? ›

It turns out duplicates confuse search engines. If they can't tell which copy is the original, all versions will struggle to rank. Or if search engines are forced to choose one duplicate copy over another, the visibility of the other versions diminishes.

Does duplicate content matter? ›

Duplicate content confuses Google and forces the search engine to choose which of the identical pages it should rank in the top results. Regardless of who produced the content, there is a high possibility that the original page will not be the one chosen for the top search results.

Why duplicate data is a problem? ›

Good reporting requires accurate data that is free of duplicates. Duplicate data inhibits this. Reports generated from duplicate records are less reliable and cannot be used to make informed decisions. The business will also find it difficult to forecast what it should do for future growth.

What is an example of duplicate? ›

Adjective I began receiving duplicate copies of the magazine every month. I had a duplicate key made.

What is an example of duplication error? ›

Error of duplication is when an accounting entry is duplicated, meaning it's debited or credited twice for the same entry. For example, an expense was debited twice for the same amount would be an error of duplication.

Videos

1. The 3 Major Issues With Duplicate Content | Clix
(Clix)
2. Shopify duplicate content SEO Issues: Every Shopify site comes with this problem
(Determined Solutions SEO)
3. Dangers of Duplicate Content For SEO & How To Resolve
(HQ SEO)
4. Fixing Duplicate Content Issues on an Ecommerce Website | Edge of the Web Radio SEO Podcast
(EDGE of the Web)
5. Uppercase URL vs Lowercase URL – Duplicate Content
(Digital Ready Marketing)
6. Duplicate Content & Multiple Site Issues
(Google Search Central)
Top Articles
Latest Posts
Article information

Author: Otha Schamberger

Last Updated: 02/20/2023

Views: 5724

Rating: 4.4 / 5 (55 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Otha Schamberger

Birthday: 1999-08-15

Address: Suite 490 606 Hammes Ferry, Carterhaven, IL 62290

Phone: +8557035444877

Job: Forward IT Agent

Hobby: Fishing, Flying, Jewelry making, Digital arts, Sand art, Parkour, tabletop games

Introduction: My name is Otha Schamberger, I am a vast, good, healthy, cheerful, energetic, gorgeous, magnificent person who loves writing and wants to share my knowledge and understanding with you.