With SEO, sometimes you hear about duplicate content and duplicate content issues. These “issues” are real and unless corrected, it is like splitting hairs and losing a large percentage of your potential SEO ranking factor within a website.
What are some of the types of duplicate content? And what can be done to manage, prevent or minimize their impact on rankings or site performance?
- Canonical http:// or http://www variations of pages.
- Shopping carts using hand me down descriptions from manufacturers.
- Pages that use the same template with very little distinction from eachother,
- Block level shingles (content copied from page to page or from other websites).
As you can determine from the brief list above, there are multiple species of duplicate content and each should be eliminated or managed from the level of your content management system, .htaccess file or through the uniqueness of content on each page.
Today we are only addressing the type created within your own website (as a result of content and indexing from lack of enough content) and ways to avoid or eliminate it.
We often emphasize the importance of uniqueness on a page by page level within a website. Having 100 pages and 800 of them having a 12% difference as a result of a different title, model#, photo and brief description doesn’t qualify as enough distinction significantly.
Common Duplicate Content Solutions
There are a couple of different things you can do to eliminate duplicate content, refine canonical issues or control which pages are indexed and which others are not.
1) Write enough content to shift the collective focus of the page.
2) Personalize data feeds or commonly used databases for makes and model#’s from mfg. for ecommerce products.
3) Add a custom field or segment on your ecommerce pages to create their own digital footprint.
4) Consider using meta-tags, or robots.txt to block or prevent indexing of extremely similar pages (which alleviates the penalty).
5) Use a canonical tag so that pages that are a linked like a daisy chain (or bread crumbs) all link back to the leader as the primary index page. This is very useful for deep categories where getting to the main category page is sufficient from an SEO perspective.
6) Personalize footer links to prevent diffusing the pages topic sitewide through “making each page similar”.
From the perspective of search engines, realize that if a page has nothing to contribute, if it already has 20 other pages from your website that have the same look, feel, code and context, you can’t expect them to index and waste space within their primary index on cookie-cutter pages.
To briefly address the points above:
For page to page level canonical issues use a 301 redirect by making adjustments to your .htaccess file (granted you are running a Linux server). More information on 301 redirects and how to implement them here from Taming the Beast.
Once fixed, depending on the type of site level duplicity, your link flow will consolidate within a site where you can then use internal linking to shore up segments of your website that may be depleted or starved for sufficient link flow.
Also, make sure if you have multiple versions of your homepage to consolidate them using the page level redirect method to one specific version http://www.yoursite.com/ versus http://www.yoursite.com/index.html or http://www.yoursite.com/default.htm |.aspx |.shtml | .cfm, etc.
Also, when linking to those pages (both within your own navigation and from other sites), make sure you use one consistent page / naming convention / URL.
Solutions for Multiple Pages with Similar Content
There are a few ways to remedy this. You can use a canonical tag to refer back to the original page or manage what gets indexed to avoid duplication.
If you have access to personalize the headers of the page, then you can implement a simple meta tag.
NOINDEX, FOLLOW – invites the spiders / user agents to crawl your page, but prevents that page from being indexed. Yet search engines will follow the links and include them for ranking factor or discover new pages based on the links present on the page.
This would be the preferred selection for ecommerce pages, if you wanted to emphasize the link back to the main category page and apply your SEO efforts to make the category your preferred landing page for rankings and traffic from natural search.
INDEX, NOFOLLOW – Tell search engine spiders / user agents to index the page, but not follow the links on the page to other pages or count them in their algorithm.
NOINDEX, NOFOLLOW – informs them, that you gave at the office and you would rather not be bothered. In other words, NOINDEX, NOFOLLOW means that the page will not be included in the search engine index and the links are not followed or treated with any significance.
The most common meta tag setting is INDEX, FOLLOW which is like leaving a plate of cookies out for Santa, it is the best way to ensure that permissions are granted and welcome them in to crawl, index, sort and earmark pages in their index for information retrieval. The same could be done at the server level with a robots.txt file as well. For more information on robots.txt follow the link.
If you have dynamically loading pages that insert tracking cookies, session ids, or other unique strings into the URL of your pages, then using the canonical tag is the best way to ensure your pages are not creating duplicity across your entire content management system.
A canonical tag is a tag that the three major search engines (Google, Yahoo and Bing) recently started to support that eliminates clutter in their index by not allowing multiple variations with slight differences to appear as more than one page.
By adding the canonical tag, it ensures that much like highlander, there can be only one, and the main page gets the credit, rankings and authority. For more information on canonical tags and how to implement them, follow the link.
Other Methods to Eliminate Duplicate Content
Adding unique content for pages that requires more emphasis, this includes rewriting your top selling products if you are using a shopping cart to avoid duplicate shingles from other sites selling the same products.
At first, you think, who has the time to rewrite hundreds or thousands of products? The answer, the companies that know the value of unique content, authority and what it means to rank heads and shoulders above the competition with a fraction of the backlinks or the SEO required to propel them there.
Every word you add to a document, it changes the landscape of that page, the more unique it is and the more frequently it references the primary keywords the page targets, the easier it is to separate that page from other pages in the template.
Navigation is also another area of concern for nested pages as they can cripple your near duplicate pages by convoluting the page with numerous (me too) links, like the other pages and completely diffuse the UNIQUENESS of the page.
You can tuck navigation behind an IFRAME, use of nofollow, or redirecting the navigation links from behind a /CGI-BIN/ script where all of the redirects happen in a page not followed by robots or blocked by robots.txt.
However, despite if they are followed or not, the presence of all of those links creates a unique page impression that for lack of better terms skews your s.k.u’s (stock kept units) by slightly obscuring their focus.
The advantage to the CGI-BIN method is, you could then insert includes or use contextual links (links in the body copy of the document) to connect relevant pages using keyword-rich anchor text to increase relevance and ranking position.
Footer links or breadcrumbs would then carry more weight in the algorithm since every link is not WIDE OPEN on your pages. Although that falls into the category of page sculpting, it is still important to understand the premise behind the topic.
Another Advanced SEO Tactic
1) Create a custom text field somewhere on the page through the content management system.
2) Populate that content through an XML feed or pull data from a blog. The blog could be set to noindex, follow and link to the new destination landing page and the landing page showcases the feed data as its own.
3) Or for a low tech solution, simply write 250-400 words on the topic and insert that into the custom text field.
As you can see, where there is a will, there is a way. The blog method mentioned allows you to easily implement a CMS, keep the content from showing up in search engines (on the blog itself) as you can set it to archive or noindex, then the aggregated data, which you created can be utilized to rank on the page of your choice.
You could also tag the content, and aggregate it through additional means like Google Base and other RSS based feeds for additional leverage.
The real take away here is, if you want to distinguish your website in search engines, be ready to take an extra step and implement solutions that support granular changes to url naming conventions, on page content, link structure and CMS architecture. All of these changes are secondary to thoughtful planning and can be eliminated entirely from building it properly in the first place.
However, since that is not always an option, having the ability to implement distinct modifications and functions to produce optimal ranking factor means creating custom programs, hacks and workarounds to adapt, modify and optimize your CMS, add content on the fly, change or modify header information, rewrite titles and more.
In the event that a redesign is not applicable, there are plugins that can help to distinguish your money pages through using some of the tactics we discussed above.
Also, did I mention it was free? If you use the WordPress platform, then we invite you to download our SEO Ultimate Plugin which has the ability to rewrite titles, meta data, tag and archive pages, adjust headers, track 404 errors, alter the robots.txt or .htaccess file, toggle which modules are active and numerous other SEO features.
As stated above, it’s better to build it right the first time, but just in case you can’t there are always solutions for common SEO problems.