Duplicate Content and Shingle Analysis for SEO

Avoid Duplicate Content for SEOSearch engines use shingles (groups of content or clusters of words in “exact match” form) and shingle analysis to extract the block-level contextual references and assemble the content of a web page. You hear the warnings about duplicate content in regards to SEO, but what does it really mean?
Duplicate Content and Shingle Analysis for SEO, by SEO Design Solutions.
Search engine etiquette in many facets mirrors human behavior, for example, link popularity emulates cliques and third party referrals a.k.a endorsed editorial links as authority, expertise and reputation are emulated by trust rank. Variables such as link popularity and trust sculpt the outcome of why, where and how billions of websites rank in search engine’s index.

Just like you would be penalized if a teacher caught you cheating or copying from others, search engines are algorithmically inclined to seek out the source of a shingle (to eliminate duplicates from their index).

There are two types of duplicate content for a website (1) if your server headers are not properly configured so each page in your site is available from an http:// prefix and the http://www. prefix, which means that each page is a potential replica of the other (remedied by an .HTACCESS preference) or (2) duplicate shingles “exact match segments” of words spread across (a) your own pages or (b) multiple websites.

Regardless if the shingles are smatterings across your own website or multiple sites online, your pages can essentially invoke a penalty and potentially cancel each other out. From the standpoint of storage to search engines (which costs money), let’s face it, a copy is really not that original.

From their prospective (if spiders have one) they are keeping the original sources intact as the authority and for the parrots that scrape or attempt to spin articles, they ultimately leave a trail unless the segments are sufficiently scrambled and re-written. Often, simply writing unique content would have been less effort.

Scraping content for reproduction from topical sources via RSS feeds is a very common tactic for building automated MFA (made for ad sense) sites. There is nothing more annoying than seeing a post or article that took the author hours to create end up on some “no name site” with a “fictitious writer” who is now competing with your website for the very same keywords, tags and titles you just meticulously crafted with care.

Despite the reason why webmasters copy or scrape content such as (1) they are just plain lazy (2) to build up topical sites so they can implement a 301 redirect to their “real money maker website” after the site gains page rank (3) for ad sense or affiliate revenues or (4) for search engine rankings based on the labors and content of others; the impact of duplicate content ripples across the web keeping spiders busy repressing blatant plagiarism.

The more sophisticated and savvy search engines like Google can smell a scraped shingle a mile away and make adjustments to ensure that the original source gets the credit and the duplicate simply recedes into the penalty zone where its semantic currency is capped and quarantined.

When it comes down to it, you are better off avoiding penalties (large or small) to streamline the velocity of your website as it moves through the evaluation process. Just like barnacles impede a ships trajectory, duplicate content impedes your relevance score unless your site is the original source.

What this means in layman’s terms is, if you are using article marketing as a tactic for building links or driving traffic to your pages, make sure the content is original. Believe me, search engines know, so in closing, just write your own.

If you are seeking solutions to quell plagiarism, services exist such as copyscape who will alert you if you if your content is being cannibalized or blatantly scraped. Or, at any time, just grab a segment of your own content, put “quotes around it” in a Google search box and hit return to see what data is retrieved. If your site is the only one using the exact match formation of the snippet, then your content is sited as the original.

You can also use Google Blog Search and set alerts for titles or specific content “in quotes” to get email alerts as they happen and use Domain Tools to find the phone numbers for the offensive webmaster in question (to give them a ring when they least expect it).

So, now if someone asks you about block level shingles or shingle analysis, you know that in context to search engines, they are not referring to a roof.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Related posts:

  1. When is Duplicate Content a Good Thing?
  2. Duplicate Content and SEO
  3. Block Segmentation, Duplicate Content, SEO and IR
  4. Website Analysis: SEO Starts with a Thorough Analysis
  5. Spying on Your Competition through Competitive Research Analysis

6 Comments

  1. Posted October 23, 2008 at 5:35 pm | Permalink

    nice analogy

  2. Posted February 13, 2009 at 2:12 am | Permalink

    While these make sense for web content, it does not work for shopping sites. Eg our site sell the same game as 20 other shopping sites. Being the same game means the name and description is the same. We are currently penalised while other site with the same content stays in front.

  3. Posted February 13, 2009 at 3:47 am | Permalink

    On the contrary Michael, it does apply to your site. The reason is due to duplicate content, which essentially means that you need to add additional content to shift the keyword density of your shingles.

    If possible edit the order to keep the context and meaning, but shift the actual words (rewrite it)…

    Then, use static pages (such as a sitemap) or blog post with deep links to link to the page with preferred anchor text.

    That should get the page reindexed and if you add at least 5-10 links per page (if possible), with unique content, you will see a complete correction of how your site ranks.

  4. Posted July 13, 2009 at 9:44 am | Permalink

    I used to publish my articles, but now I wander should I stop doing this, because the risk of duplicate content penalty. Should I stop publish my articles on article directories?

  5. Posted November 28, 2009 at 11:24 pm | Permalink

    Sometimes I wonder about going to far with emphasizing my keywords…a lot of my external links will use 2 keyterms, where term A is always the same, and term B is different. Do you think that model would be flagged?

  6. Posted November 29, 2009 at 11:24 am | Permalink

    I can be, better to add a modifier or plural variation to term B just to be safe…

    Thanks for visiting

26 Trackbacks

  1. [...] parse your pages and collect snippets of contextual data such as a combination of words (known as shingles) to assess what that page is [...]

  2. [...] others feel free to borrow each others call to action, just look at the shingle analysis using quotes for the phrase “Ethical SEO That Really Works Our Clients Achieve Top 10 [...]

  3. [...] When it comes to content, just make sure it is original as search engines use shingles and shingle analysis to determine duplicate content. [...]

  4. [...] it is all about co-occurrence and continuity as PaIR (phrase based optimization), block segment and shingle analysis have now replaced antiquated formulas that equated word volume with [...]

  5. [...] the other “me too” sites out there chiming off like parrots about a theme. Aside from shingle analysis and duplicate content filters, phrase rank, trust and popularity are metrics you have to persuade as [...]

  6. [...] The age of content, referring co-citation of references (links) on page factors (grouping of relevant shingles and clusters a.k.a. on page SEO factors) as well as the total saturation of keyword co-occurrence [...]

  7. [...] to work with, that means I can rank 4 competitive terms (25 internal links per keyword to avoid keyword or content cannibalization) and roughly 10 secondary keywords for less competitive terms by editing those pages adding links [...]

  8. [...] between words, pages, links, the proximity and context of link prominence, topical link weight, and shingle analysis in addition to off page factors; search engines then assess and grade the degree of relevance your [...]

  9. [...] between words, pages, links, the proximity and context of link prominence, topical link weight, and shingle analysis in addition to off page factors; search engines then assess and grade the degree of relevance your [...]

  10. By Keeping SEO Natural | SEO Design Solutions on February 10, 2009 at 10:10 am

    [...] is one of such attributes, it is not natural for a website to have multiple links appear with duplicate shingles across multiple or clustered IP addresses (as this indicates [...]

  11. [...] natural search and SEO rankings are contingent on the volume of information for any given shingle (keyword, key phrase or group of related words). What this means is, the site with the highest relevance for any given keyword or key phrase [...]

  12. [...] marketing for example. Since this key phrase has polynyms as well as a number of interesting shingles (word groups). You can approach it from many angles when building internal and external links. [...]

  13. [...] the higher the probability of diffusing or diluting your global nodes / top ranking keywords (a.k.a keyword cannibalization) which is a type of over-optimization [...]

  14. [...] are dependent on feedback from users, other websites using metrics such as the link graph, shingle analysis, co-occurrence and other vector based metrics; any site creating an unnatural patterns is bound to [...]

  15. By 8 SEO Ranking Factors | SEO Design Solutions on June 2, 2009 at 1:22 am

    [...] content’s chances or standing out as a relevant landing page due to duplicate content and shingle analysis. Search engines are fussy readers, so give them something worthy to [...]

  16. [...] is crucial for letting search engines know which pages deserve more emphasis. Often (as a result of duplicate content) pages will fight amongst themselves since templates exist or if each page is not 70% unique from [...]

  17. [...] Duplicate Content – I mentioned before in Part I that search engines are not a fan of duplicity or duplicate content, so, the less you replicate across a site, the more distinct the signature becomes when attempting [...]

  18. [...] of duplicate content through custom sidebars, iframes or footers as secondary [...]

  19. [...] Duplicate Content and Shingle Analysis for SEO [...]

  20. [...] spam and automation, as a result, they have filters built into the index to investigate things like duplicate content (just like highlander, there can be only one), automated link building and unnatural [...]

  21. By Duplicate Content and SEO by SEO Design Solutions™ on September 22, 2009 at 9:01 am

    [...] SEO, sometimes you hear about duplicate content and duplicate content issues.  These “issues” are real and unless corrected, it is like [...]

  22. By A New Approach to SEO by SEO Design Solutions™ on October 1, 2009 at 1:48 pm

    [...] multiple pages in your Web site? You can incur a ranking penalty for overdoing it, which is called keyword cannibalization. To avoid this, make sure that the page in question only references elements in the title and try [...]

  23. [...] If it is in the index, it can rank for something (however abstract or precise) or potentially pass that ranking factor on to another page. Relevance boils down to is subjective objectivity on an algorithmic scale it equates to links, bandwidth and demand. Supply and demand leave a trail. [...]

  24. By 20 SEO Tips for 2010 by SEO Design Solutions™ on December 27, 2009 at 12:54 pm

    [...] on that page, it can lose relevance as the navigation and other code structures collapse and all interject their shingles to offset or diffuse the pages unique purpose and optimal [...]

  25. By 20 SEO Tips for 2010 | TuVinhSoft .,JSC on January 11, 2010 at 9:51 am

    [...] on that page, it can lose relevance as the navigation and other code structures collapse and all interject their shingles to offset or diffuse the pages unique purpose and optimal [...]

  26. [...] on that page, it can lose relevance as the navigation and other code structures collapse and all interject their shingles to offset or diffuse the pages unique purpose and optimal [...]

Post a Comment

Your email is never shared. Required fields are marked *

*
*