First, the question; why is eliminating duplicate content important to SEO? The answer is predicated on the conditions that are promoted if duplicate content is allowed in place of a consolidated preference. This tutorial discusses what the function of the .htaccess file and ways you can harness it for the purpose of search engine optimization.
Why Eliminating Duplicate Content is Important
Pagination and duplicate content issues create dead ends in search engines, essentially it’s like hemorrhaging relevance as every algorithmic point that gets funneled into a dead end, essentially pools there; and unless rescued or redirected is like potential energy all bundled up with nowhere to go.
This precedence of wastefulness is to be avoided at all costs; you need every iota of link flow you can muster. If you are up against a cunning competitor who understands that a simple consolidation such as eliminating duplicates on a site-wide level, is more than enough to tip the scales when someone else’s page must leave the search engine results page for a specific query and another must rise.
Search engines by their very design embrace and promote competitiveness based upon the scarcity of attention. That scarcity in turn fuels the commodity of ranking higher, which in turn captures a larger percentage of the resource of “consumer attention”, which is the gatekeeper of market share.
That attention can in turn be monetized (based on search engine result page position) and converted into profits through elevator pitches, value propositions, tactful and relevant landing pages and ecommerce. Granted that they can get to a position for a particular keyword in the first place…which is why SEO is a game changer for small business.
By expanding your online reach through keywords and their respective rankings, you create security and can technically eliminate chance by using an existing dynamic (exposure and click thru rates) to monetize time, energy and effort with a tangible monetary gain. However, with duplicate content, your only achieving 50% of each pages potential in search engines, so, it is an economic as well as technical concern.
Creating a Protocol for 301 Redirects
Before you can feed the search engine spiders with content, you need to clean up a few potential SEO errors that could take your website out of the game before even getting started.
One culprit capable of this being duplicate content caused by search engine spiders crawling the http version of your website and www version of your website and attempting to determine which came first or which is preferentially more important.
Fortunately, this temporal default setting for servers can be administered and remedied through using 301 redirects for the canonical and non canonical versions of pages or the domain using the mod rewrite function added via the .htaccess access file.
Stopping the Loop of Producing the Dupe
In all honestly, duplicate content undervalues the proposition of why your content is important and imperative to rise above other copies of snippets and context consolidated on other websites. In other words, duplicate content (something struggling for independence) within a website is frankly not that interesting to search engines.
Clear examples are you can find a home page through typing in:
- http://website.com
- http:// website.com/
- http://www.website.com
- http://www.website.com/
- http://www.website.com/index.html
- http://website.com/index.html
All of these addresses can be returned because no particular convention or default preference has been clearly established at the server level.
This means that inherently, unless they are consolidated or unified by a form of governance from the server level, that your website is producing multiple variations by default which equate to crawl errors or duplication, (like content overpopulation based on false positives).
Instead of sitting back passively like an onlooker and just observing this phenomenon, the first thing a webmaster can do is place a rule in place on the server level that redirects all unnecessary secondary or redundant conventions to one unified format (most SEO’s use the http://www.website.com/ convention to prune duplicate content.
The .htaccess file is a file capable of overriding global settings during the hypertext requests between servers, pages and serving that content to browsers.
Since crawlers also use these rules or information, you can aid them in cataloging your website with elements that are more conducive to a unified or SEO Friendly structure. Wikipedia’s definition defines the .htaccess file as such:
“In several web servers (most commonly Apache), .htaccess (hypertext access) is the default name of a directory-level configuration file that allows for decentralized management of web server configuration.
The .htaccess file is placed inside the web tree, and is able to override a subset of the server’s global configuration; the extent of this subset is defined by the web server administrator. The original purpose of .htaccess was to allow per-directory access control (e.g. requiring a password to access the content), hence the name. Nowadays .htaccess can override many other configuration settings, mostly related to content control, e.g. content type and character set, CGI handlers, etc”.
Source: http://en.wikipedia.org/wiki/.htaccess
So, now that you know what it is, you should have some nifty tips and tricks to code it. Instead of writing one here, we have found an excellent resource in which to share that is far more extensive.
Aside from the typical redirect which would involve the following in a Linux based server environment:
For example, the code to redirect the http version of a website to a www format would be:
Create an .htaccess file with following code, this will ensure that all requests to domain.com will be redirected to www.domain.com
The file needs to be placed in the root directory of your old website (i.e the same directory where your index file is placed)
Options +FollowSymlinks
RewriteEngine on
rewritecond %{http_host} ^domain.com [nc]
rewriterule ^(.*)$ http://www.domain.com/$1 [r=301,nc]
Please REPLACE domain.com and www.newdomain.com with your actual domain name.
Note* This .htaccess method of redirection works ONLY on Linux servers having the Apache Mod-Rewrite module enabled.
If you do have a Linux server with the rewrite function enabled.
- Create a notepad and save a file with the name htaccess.txt
- Copy and paste the rewrite rules and conditions (above) and amend your domain information
- Save the file.
- Rename the file (using Mozilla, or any other ftp program) from htaccess.txt to .htaccess
- Uploading the file via ftp (file transfer protocol) to the root folder or subfolder the rule is enforced in, then test it to see how it works….
Our new All in One WordPress SEO Plugin named SEO Ultimate allows you to modify the .htaccess as well as the robots.txt file (another important file that sets precedence for domain level permissions). By using SEO Ultimate, you can directly make changes without the step 1-5 process above, and just insert the code of choice.
Additional .htaccess Tips and Tricks
There is a masterful resource located at http://www.askapache.com/htaccess/mod_rewrite-tips-and-tricks.html which covers dozens of .htaccess rules, conditions and tactics.
- .htaccess rewrite examples should begin with:
- Require the www
- Require no www
- Check for a key in QUERY_STRING
- Removes the QUERY_STRING from the URL
- Fix for infinite loops
- Redirect .php files to .html files (SEO friendly)
- Redirect .html files to actual .php files (SEO friendly)
- block access to files during certain hours of the day
- Rewrite underscores to hyphens for SEO URL
- Require the www without hardcoding
- Require no subdomain
- Require no subdomain
- Redirecting WordPress Feeds to Feedburner
- Only allow GET and PUT request methods
- Prevent Files image/file hotlinking and bandwidth stealing
- Stop browser prefetching
- Make a prefetching hint for Firefox.
As a word of wisdom, be careful when making modifications to this file as one error in syntax can take a site out, until the original or proper syntax replaces it. So, if you are copying and modifying rules just make sure you use all of the characters suggested (so you don’t break it). Just make sure you have a backup, so, worst case, you can ftp it back if need be to restore the global permissions from the web.
- .htaccess rewrite examples should begin with:
- Require the www
- Require no www
- Check for a key in QUERY_STRING
- Removes the QUERY_STRING from the URL
- Fix for infinite loops
- Redirect .php files to .html files (SEO friendly)
- Redirect .html files to actual .php files (SEO friendly)
- block access to files during certain hours of the day
- Rewrite underscores to hyphens for SEO URL
- Require the www without hardcoding
- Require no subdomain
- Require no subdomain
- Redirecting WordPress Feeds to Feedburner
- Only allow GET and PUT request methods
- Prevent Files image/file hotlinking and bandwidth stealing
- Stop browser prefetching
- Make a prefetching hint for Firefox.
hey dude
Great stuff! i got a lot of inspiration from this post
i went through this page four times
it is very interesting ….
am learning for social work
Thanks
very informative side for seo
There is some of the best SEO tips on this blog.
I am consuming as much as I can and will start implementing the strategies in the morning.
Thanks for the great content
Glad you appreciate the info Douglas. Thanks for visiting.
Thanks for excellent explanation what it really is :-) What I’m missing is where in SEO ultimate which I have installed can I change it. I completely novice and have no experience in changing code of my web page. I have found where .htaccess is but still I’m missing step by step instruction i.e. do I have to repeat steps 1-5 for all examples you have mentioned, etc. I have few more questions but I’m not sure that can explain them so easily.