Last Updated on September 20, 2018 by Kuldeep Bisht
Any piece of content present on an online platform which is reused or has been copied from other websites is a duplicate content. Google algorithm compares and evaluates the information provided on a website in real time and if the content does not carry due credit/reference or is similar to the information provided by other websites it attributes it as duplicate content.
Google defines Duplicate Content as:
“Duplicate content generally refers to substantive blocks of content within or across domains that either completely matches other content or is appreciably similar.”
The possibility of duplicate content does not end up across domains but also might be present within a domain. Duplicate content can be classified as:
Table of Contents
Non-malicious Duplicate Content
Non-malicious duplicate content refers to the content which has been optimized for various platforms to fit in the needs of a user. This is done by optimizing the same page for HTML, Smart Phones, same product description which can be presented to an online visitor with different URL’s.
Malicious Duplicate Content
Malicious Duplicate Content refers to the content which has been optimized used to manipulate the search engine rankings so as to rank above your competitors.
Be it blogs or website content Malicious duplicate content can be observed on many levels. Content can be tagged as duplicate if:
- Other users might lift your content without giving due reference or credit for the content to rank higher in search engine rankings.
- You have provided with different set of URL’s pointing to the same piece of information
In blogs, content theft can be restricted with the use of Word-press plug-ins such as WP Content Copy protection or Tynt Insight for Word-press to prevent content theft. So, next time you start a blog with a customized word-press themes make sure to employ appropriate plug-ins for your benefit.
You can easily address the issue of duplicate content by following the below mentioned procedures with which you can monitor as well choose the content you want your visitors to have access to.
How to address the issue?
1- Use 301s and be consistent
Online visitors can access your website through different URL formats such as: http://”domain-name”.com/home, http://home.”domain-name”.com or http://www.”domain-name”.com. In case you have different URL’s pointing to the same website Google crawlers might deem the information as Duplicate content.
http://”domain-name”.com/, https://www.”domain-name”.com, https://”domain-name”.com all appear and are same. The issue crops when Google crawlers take note of it, they deem these to be different URL’s with the same set of information.
In both the cases pick one URL and employ a 301 redirect to send traffic to preferred domain URL.
The first step to set up 301 re-direct is to open the text-based file .htaccess. “.htaccess” file helps you alter the configuration on the Apache Web Software. One you open .htaccess file:
- Enable the Apache mod_rewrite module.
- Enable the RewriteEngine in the mod_rewrite module.
Add the following set of codes to it:
- Options +FollowSymlinks
- RewriteEngine on
For a single page URL re-directs to another add the following line of code:
- Redirect 301/redirectpage.html http://www.xyz.com/newpage.html
For a domain name URL to another:
- redirectMatch 301^(.*)$http://www.xyz.com
Make sure to place the .htaccess file in the same place where is your index file located.
Redirects have a direct impact on SEO strategies in place. Here in, you will have to consider tweaking your existing SEO strategy for better results with new redirects.
2- Use precise domains
You cannot do without addressing the issue of duplicate content because of multi lingual website. For country specific content use http://www.doamin.de to cater to the audience based in Germany. Google crawlers can rank your page in search engine based on localized search query.
To avoid overlapping of the content translate your content for the niche audience. If you happen to cater to audience of different countries search engine should be able to differentiate between the two same URL’s. For example, if you market your services in Germany as well as UK then the apt URLs should be http://www.doamin.de for Germany and http://www.doamin.de for UK.
3- Syndicate carefully
For an e-commerce business website, it is necessary to know how search engine determines what your website is all about. Content for one plays a major role for algorithm to get a hang of your website. With influx of e-commerce websites content syndication in present times is a very fruitful marketing tactic. From the data accumulated by Curata presently 10% of content available on online platform is syndicated.
However, you do not have any say which version Google should opt for to display in search engine results. What you can do is:
- Include a back link to the original article
- Make it a point that you get credit for the content or people who use your material should use “noindex meta data” code example for which is given below:
<html> <head> <meta name=”robots” content=”noindex”> <title>Don’t index this page</title> </head> Possible values: “none”, “all”, “index”, “noindex”, “nofollow”, and “follow”.
With <title>Don’t Index this page</title> it becomes easy for search engine to provide your content in search engine results instead of indexing their version of content.
4- Site to be indexed
Choose your preferred URL from http://www.domain.com and http://domain.com as your website to be indexed in search engine results. As both URL’s specify to same location you can specify your preferred URL by:
- Accessing Google Search Console Home Page and click on the site you want
- Click the gear icon to click “Site Settings”
- You can find a Preferred domain section to select the version you want
What is does is whenever an online visitor searches with the other URL which you have not selected as preferred URL they are automatically directed to the Preferred domain name.
Here in you need to use 301 re-direct to direct traffic to the preferred URL.
5- Boilerplate repetition
Google does not mind websites using boilerplate repetition but in Webmaster guidelines states that:
“Minimize boilerplate repetition: For instance, instead of including lengthy copyright text on the bottom of every page, include a very brief summary and then link to a page with more details.”
Boilerplate content is present under different URL’s of the same Webpage. There are different routes to reach for information.
- http://www.domain.com/products/men/footwear/black.htm
- http://www.domain.com/products/men?category=footwear&color=black
- http://domain.com/shop/index.php?product_id=32&highlight=black+footwear&cat_id=1&sessionid=123&affid=431
The above URL’s might be different but have same destination of “Black Footwear”. Amongst these URL’s Google algorithm by itself chooses the most appropriate URL to represent the cluster in search results. These are regarded as Non malicious duplicate content and do not have impact on your search engine results.
6- Avoid publishing stubs
If you have a page which does not have content or you do not want it to be indexed use noindex Meta tag to let Google crawlers know that you do not want the specific page to be indexed in search engine results.
<meta name=”robots” content=”noindex, nofollow”>
You can use above code to instruct Google crawlers not to index the web page or any of the linked pages. An irrelevant or empty page might only create nuisance for the search engine in the process can affect your search engine ranking.
7- Minimize similar content
The emphasis for website content should be on quality and not on quantity. Do not optimize your webpage with unnecessary content. If you have pages that have somewhat similar content and provide with similar information it would be better to consolidate the information in one single page.
For example if you have an automobile website which has separate pages where in you are describing the same features of two brands of an automobile. It would be a better idea to consolidate both the pages and then link to the two brands that you are talking about.
What Google does not recommend?
It might not be wise to employ robots.txt file or other means to block the Google crawlers from accessing web pages with Duplicate content. Instead, let the crawlers assess the pages but make sure to mark them Duplicate with:
- rel=”canonical” link element
- URL parameter handling tool
- 301 redirects
Google does not invoke any sort of penalty if you indulge in employing Duplicate content rather it evaluates the best piece of information it can provide the user with. Only if the means are deceptive and manipulative with intention of affecting search engine rankings the website rank might suffer or it might even be removed from search engine results.
The success of any online platform is proportional to the quality of user experience. There are many attributes which define a seamless user experience. Besides the technical aspects of website design and ease of navigation a user looks for new piece of information which sets you apart from your rivals. In case there is another website using your content without any legitimate permission or using backlink you can contact the website owners for suitable action or can contact Google to get the respective pages removed with Digital Millennium Copyright Act.
The sole ideology of Google from its very inception has been to provide user-friendly experience to its audience. SEO professionals who optimize the search engine ranks have been provided roadmap of what is expected of them to contribute their bit. SEO combined with user-friendly tactics can lead to wondrous results. Contrary to the buzz Google does not penalize websites for duplicate content but lays down clear rules to adhere to the clean practices. Malicious duplicate content can certainly have an impact on your search engine rankings.