in SEO by Jeffrey_Smith

To understand SEO, you need to understand two things (1) the search behavior of motivated buyers and (2) what search engines deem relevant. Common sense coupled with insight to the concepts that structure how search results are parsed, weighted and rendered occur based on search engine algorithms and are based on information retrieval.

How do Term Frequency and Inverse Document Frequency Determine Relevance?

How do Term Frequency and Inverse Document Frequency Determine Relevance?

Studying the premise of (ir) can increase your understanding of why things rank the way they do. Metrics such as Probabilistic Term Reweighting, Local and Global Context Analysis, Pattern Matching through Phrases and Proximity all leave their indelible mark on search results each and every time a query is executed.

By familiarizing yourself with these metrics, you can replicate consistency across multiple metrics (optimization) to stabilize heuristic conclusions and produce tactical advantages to produce documents/pages that will achieve a high relevance score which directly equate to rankings / visibility / and conversion through commerce.

To gain more insight into what happens behind every search, there are two prominent metrics called term frequency (tf) and inverse document frequency (idf) or when used in tandem become (tf-idf). They are used as assembly nodes for measuring the probability/relevance score against queries to the index (a Google search) through using a parsing method known as an inverted list or inverted index.

Term frequency can be used to measure how many time a term appears in a document or also be used in context to determine relevance across multiple documents (like a metric of authority). In case you want to build your own search engine to test this out like a friend of mine whom I nickname professor, Mr. Tim Nash (information architect) to test variables, here is an introduction by Thanh Dao on Term frequency/Inverse document frequency implementation in C#

In case your busy and just don’t have the time, then you could opt for using a command to query Google which elaborates the relative volume and occurrence of a keyword or key phrase. This command when applied with other optimization metrics (such as deep link percentage (the number of links to that page from other sites), internal link percentage, domain authority / relevance) provides rather revealing  insight as to why a web page or site ranks or is weighted the way it is in the index.

This advanced Google search operator that is most commonly used to query the index specifically about the relationship / volume of saturation or occurrence a keyword has in context to the entire indexed website. This in rudimentary form is (tf-idf) extraction.

The search operator is…

site:yourdomain.com keyword[simply replace yourdomain.com and keyword] with the site your analyzing.

For example, to determine how many times the key phrase “business consulting” appears in cnn.com website for example, I would simply open a Google search bar and type in site:cnn.com business consulting and execute the search. Then, a list of relevant documents will populate that have the shingle/keyword or stemmed semantic variant (singular, plural or synonymous keyword)  in the title, in the meta description or on the page. Results are displayed in bold to show occurrence.

Pages that are weighted heavily through internal or external links will have the highest degree of prominence and authority and more than likely exhibit more exact match occurrences of the phrase you are querying.

They are thereby considered more relevant because of the co-occurrence of the keyword in tandem with multiple algorithms running concurrent within the search engines which check, cross-check and sift or extract data based on the weighing formulas used to determine which pages have the highest correlation.

This alone can be used for finding the best page for internal links or finding an ideal preferred landing page to consolidate page weight through link volume. So, instead of just linking to the homepage, I would use this command to shift ranking factor to a page that is specifically about that topic and would have a higher degree of being a relevant result for someone searching.

By optimizing your content you are helping search engines do their job. Not only does relevant linking increase the deep link percentage of that page and since search engines rank web pages not websites, it also allows you to have multiple pages returned from the same query if they all share a similar semantic thread or deem a particular page as the preferred landing page for a specific query. You can use a virtual theme (just use links) or employ a themed and siloed site architecture to reinforce relevance within your own website.

If you understand that term weighting, co-occurrence and deep link percentages alone can make any page in a website behave like a PPC landing page (you type in a query and it organically ranks for it), then you may want to delve into this fundamental formula with more verve and start running your own tests to conclude your own results.

Co-occurrence equates to ranking power when consolidated. If you have 100 pages of solid content on Topic A and your competitor has 3 pages of relevant content, which result do you think search engines would reward?

It’s not just about volume as spam detection and dated concepts of keyword stuffing no longer prevail in search engine algorithms. Yet, if you invest the time to produce unique, structured content and assign it a place in the hierarchy of your website (preferably themed), then it is only a matter of time before those pages transform into nodes of stability to use an anchor points for current and future rankings.

Having more pages in a search engines index translates into ranking insurance as SEO defense and can be utilized to insulate vital rankings. Each new page has the ability to attract its own links and stem into a landing page capable of ranking for dozens of keywords or it can be used to pass along its ranking factor to a preferred landing page (housing a more compelling value proposition).

Search engines use the vector space model and Inverse document frequency to score your pages. One example is the Glasgow Model which implements the process of normalization (determining validity), this process looks at (a) the frequency of a term in a document (b) maximum frequency of them in the document (c) the number of unique terms in the document (d) the number of documents in the collection and (e) the number of documents the term occurs. As a result, a website could gain a higher degree of trust which offsets less trusted of less robust websites where they keyword or key phrase has less saturation.

The real take away here is (1) realize that each keyword has a tipping point and (2) each page if the number of occurrences for that keyword is competitive with require a collection of supporting pages to cross the tipping point and bump other web pages aside to create visibility for your pages.

Wikipedia is a prime example of the model (a) topical continuity (b) site architecture (c) a degree of term frequency and co-occurrence between documents (d) deep links from other pages to the keywords specific landing page and (e) the fact that it is scalable and capable of devouring virtually any topic, keyword or key phrase.

For example, the word internet searched in Google has Wikipedia returned at the top position with a double listing. That specific broad match keyword has over 1.7 Million occurrences in Google’s index, but why Wikipedia would be the most relevant falls back on the algorithms mentioned above.

The main landing page for the word internet has 35,000 links from other websites to that page. This alone makes this page a contender as the amount of deep links required does have a tremendous impact on rankings, but that must also be mirrored within the website’s internal linking structure in order to be more effective. In addition to keyword prominence , the landing page ranks as a result of the synergy of off page and on page SEO factors to consolidate the position.

In this instance, there are another 50,000 internal links consolidating the ranking factor to that page all with the shingle/ anchor text “internet” that funnel link flow and relevance to the target page.

Domain authority and topical relevance also play a big role in the equation and we know Wikipedia has no shortage of that either. So, in order to rank for the word internet, you would have to scale a site capable of contending with that kind of on page of off page symmetry. As well as allow for the content to reach a natural plateau (through not pushing content or link velocity outside of natural bounds).

Applying this on a smaller scale allows you to see how favorable keywords can be elected and promoted to acquire a higher degree of relevance within a website and how through link building or viral promotion how authoritative links can elevate a page into the top ranking positions of search engines.

We know that search results displayed at the top of rankings get clicked, but it’s just a matter of getting there for optimal keywords that are relevant to your business. At least now, you have some solid research to fall back on so that you can engage each optimization campaign not only as a quest for a keyword, but more of an experiment in finding the appropriate percentages of the metrics that produce the weighting of terms in the index. 

Read More Related Posts
How Big is Too Big?
Everyone talks about authority sites, but how big is too big when it comes to creating rankings? Where is the point where having too much content can cause a website ...
READ MORE
Surviving Shifts in Search Engine Algorithms
When rankings vacillate and a website loses position for a conversion-critical keyword, it can be sheer pandemonium for businesses owners, corporate giants or affiliates alike. There are three things to be ...
READ MORE
To Follow or No Follow the Page Rank Sculpting Saga Continues
As the tension mounts for SEO practitioners regarding the recent comments from Matt Cutts (head of the Google Web Spam Team) at SMX Advanced regarding the Google no-follow conundrum (to ...
READ MORE
Getting Back on Track After the Google Mayday Aftermath
If you spend as much time analyzing the SERPs (search engine result pages) as we do for the purpose of proactive SEO planning, you are bound to catch patterns or ...
READ MORE
SEO, Link Clusters, Age and IP Diversity
Just like getting in on the ground floor of any opportunity makes perfect sense, when it comes to link building and SEO, things are no different. In fact, there is ...
READ MORE
Two Results are Better than One
The only thing better than one search result in the top 3 positions in Google is two search results from a double ranking.  This SEO tip works from pushing ...
READ MORE
Consumers Surf with Purpose
While the fond memories of yesteryear may have revealed a less savvy online surfer, times have changed and now consumers surf with purpose. Today's online SEO ripples are tomorrows big waves ...
READ MORE
Don't Disrupt Trusted Nodes of Relevance
Disrupting the trust a website has can be catastrophic to SEO. For example, if you have aged legacy content in a website that is indexed, then that page is contributing ...
READ MORE
SEO, Digital Assets and Moving Past into the Sales Funnel
Search engine optimization (SEO) is based on the premise of acquiring search engine rankings, yet rankings alone are not the solution to increased sales conversion. Just like PPC (pay per ...
READ MORE
SEO Tips for Link Structure
When it comes to SEO and search engines, cache is king. One of the most simple metrics of measuring the wealth and authority of a web property is how many ...
READ MORE
Authority Websites: How Big is Too Big?
Warning! Search Engine Algorithm Changes Detected!
To Follow or Not to Follow: The SEO/Google
SEO Tips for the Google Mayday Algorithm Aftermath
SEO, Link Clusters, Age and IP Diversity
SEO Tips to Double Rankings, Traffic and Conversion
Consumers Surf with Purpose
SEO Tips for Retooling Legacy Content
SEO: Moving Past Just Rankings
SEO Tips for Link Structure

About Jeffrey_Smith

In 2006, Jeffrey Smith founded SEO Design Solutions (An SEO Provider who now develops SEO Software for WordPress).

Jeffrey has actively been involved in internet marketing since 1995 and brings a wealth of collective experiences and marketing strategies to increase rankings, revenue and reach.

11 thoughts on “SEO and Search Engine Algorithms
  1. thank u for ur information on term frequency

  2. Nice work people. Good work. Term frequency had been really confusing for me always but reading your blog got all clear Thanks

Comments are closed.