I am sure the headline caught your attention. So often you hear the echoes of “the fear of duplicate content” making it’s rounds throughout the web. My first thought is, to give precedence to what is the main contributing factor that anyone would intentionally or unintentionally invoke a duplicate content filter.
One thing comes to mind. For those of you who are unfamiliar with the jargon referencing this phenomenon. Duplicate content filters are inherent in search engine algorithms and determine the relevance of how similar content A is to content B, if they both happen to be on the same server or if the content is on multiple servers across the web is of little significance.
With the vast computing power of data centers synchronizing their content (every 5 minutes) in some cases, if some webmaster is trying to be sneaky and pass old content that was scraped (stolen or copied) as fresh new content, the right thing to do is at least make reference to the source, but this will not stop the scrutiny from evolving.
But in the case of the content existing on your pages (such as the http vs the www aspect of your pages) you should consider doing a 301 redirect to remedy the canonicalization (you have got to love words that take two glances to pronounce). But let’s break it down in layman’s terms for everybody else.
If you were a fan of the TV Show Highlander, you know that “there can be only one.”…and much to the dismay of many who have tripped the duplicate content filter and seen their pages go south and experience a dip in rankings, the search engine agents have also adopted this philosophy.
If you have never heard of Highlander or quite frankly have no idea what I’m talking about then just realize that if you use the same navigation on every page (which most do) and you have essentially the same base keywords on your pages, chances are you will trip a duplicate content filter and see a large percentage of your site not getting indexed in search engines. This is a big deal because, if your keywords and on page content are part of the equation, then the more themed pages that are indexed provide relevance to your site and offer thousands of new possible keyword combinations that could return search result through the long tail of search.
The good news:
You can strategically use duplicate content to establish a new page across multiple search engines. If used tactfully, it is a spidering tool (through the use of social media). Make the mistake of putting your information in the title, you may bury your website on accident…more on this below.
Just know that duplicate content filters are a safeguard for your pages employed from search engine spiders that essentially determine who was first sited website publishing the originating content in question. Then after it determines who was indexed first (which may or may not be you, even if the content is yours) and then proceeds to make it’s rounds and K.O. the rest of the exact match references from the search engine index.
It seems that “there can be only one” is the motto, even though multiple pages may still be displayed, only the first gets the link juice.
The Bad News:
If you copy or replicate someones work without a reference such as originally posted by Seo Design Solutions for example (for all those who will scrape and redisplay this post as their own). Then you can clearly see that the pages attempting to get a boost are thwarted by doing the right thing. This is why even though the duplicate content filter issue is feared by most as a way to get banned from the index, it’s very function is to provide balance to the search engine result pages (SERPs).
There are times and ways in which duplicate content can be used to your advantage, such as in article marketing. If you intend to republish an article or blog post as an article, make sure that you let the original post get indexed in search engines first. Then when you have 30, 40 or 50 references leading to that one article, at least when the dust settles you get the credit.
How can you check to see if someone is skimming your content?
Use “quotes around the terms in question in a search engine” and then hit return in the query box. The quotes (although used for many purposes in SEO) clearly indicate who also is using the exact form of the words and when the content was indexed (by looking at the cache).
On the contrary, you can also use this technique to see how many places your new article or blog post have been featured “insert title here”. To take it one step further, go to Google Blog Search and set up an alert for “the term in quotes”, every time that Google finds another reference or copy of your content, you get notified via email. One last method is, you can also use this website to see if your site has been scraped.
Using duplicate content on purpose (to boost the original source) is one thing, but unintentionally hitting the mark with over 80% of your content which appear as a close match to another page, will probably get your rankings suppressed until other variables come into play (website authority, the contextual information from the rest of your site, where the link is, etc.).
Piggyback SEO, Parasite SEO and Duplicate Content:
There is another type of phenomenon that involves intentionally using other sites that have a known ability to climb through the SERPs and rise to the top to promote your content on with a helpful link or links back to your pages.
This has been called piggyback SEO, parasite SEO or in some ways social media marketing, whereby the original page that needs to get the attention of the search engines is loaded up with some link bait or compelling content, the page is tagged or copied in it’s entirety or partially into a WSYWG editor (such as in craigslist, MySpace, Squidoo, etc.) where it passes authority and link juice to the original page.
As long as the original page is indexed, you would think, that no harm can be done, right. I however, have seen instances of this tactic go awry and leave the original site no where to be found (in search engines) and the sites promoting it donning the profile of that site in the search engine result pages. So, once again be careful where you promote your links as the site with the most authority 9 times out of 10 rub out the little guy and leave them feeling like a place mat.
The moral behind the story is, if you have duplicate content on your pages (check StuntDubl’s page on duplicate content for a more detailed explanation) the severity of the penalty varies on the conditions that evoked it.
Otherwise, avoid it entirely or at least use:
1. the no follow tag on anything that is duplicated (like excessive sub navigation links that are anchor text).
2. the robots.txt file to prevent those pages from getting crawled.
or you can
3. leave them there for now, let them percolate for a few months then go back later and change the content, add some fresh links and wait for the site to get deeply crawled again to take note of your changes. So, instead of having a page with a big fat zero on the rank-o-meter, the next time is manages to get crawled, you can hit the SERP’s running with domain age, page rank and aged links if you keep them, but upgrade the content, tags and titles. This is not a bad tactic for breathing new life into sub folders with fresh links and fresh content.
But is you wish to cut down on the potential canonicalization of pages, use the no-follow tag (to at least say ignore me please to search engines), there is no need to have 100 pages of nearly identical information on your website, unless you only want to have a few pages that rise to the top.
But if you are the one that happens to get duped and you have a problem with that, you can check out the offending who is records or the site using domain tools, drop them an email or a phone call to let them know to cut it, or ignore it completely.
So long as the site in question doesn’t eclipse your page rank and website authority, give the bots a chance to go to work and they will just give your site another bump as you have been referenced yet another time for your content.