in SEO by Jeffrey_Smith

To understand SEO, you need to understand two things (1) the search behavior of motivated buyers and (2) what search engines deem relevant. Common sense coupled with insight to the concepts that structure how search results are parsed, weighted and rendered occur based on search engine algorithms and are based on information retrieval.

How do Term Frequency and Inverse Document Frequency Determine Relevance?

How do Term Frequency and Inverse Document Frequency Determine Relevance?

Studying the premise of (ir) can increase your understanding of why things rank the way they do. Metrics such as Probabilistic Term Reweighting, Local and Global Context Analysis, Pattern Matching through Phrases and Proximity all leave their indelible mark on search results each and every time a query is executed.

By familiarizing yourself with these metrics, you can replicate consistency across multiple metrics (optimization) to stabilize heuristic conclusions and produce tactical advantages to produce documents/pages that will achieve a high relevance score which directly equate to rankings / visibility / and conversion through commerce.

To gain more insight into what happens behind every search, there are two prominent metrics called term frequency (tf) and inverse document frequency (idf) or when used in tandem become (tf-idf). They are used as assembly nodes for measuring the probability/relevance score against queries to the index (a Google search) through using a parsing method known as an inverted list or inverted index.

Term frequency can be used to measure how many time a term appears in a document or also be used in context to determine relevance across multiple documents (like a metric of authority). In case you want to build your own search engine to test this out like a friend of mine whom I nickname professor, Mr. Tim Nash (information architect) to test variables, here is an introduction by Thanh Dao on Term frequency/Inverse document frequency implementation in C#

In case your busy and just don’t have the time, then you could opt for using a command to query Google which elaborates the relative volume and occurrence of a keyword or key phrase. This command when applied with other optimization metrics (such as deep link percentage (the number of links to that page from other sites), internal link percentage, domain authority / relevance) provides rather revealing  insight as to why a web page or site ranks or is weighted the way it is in the index.

This advanced Google search operator that is most commonly used to query the index specifically about the relationship / volume of saturation or occurrence a keyword has in context to the entire indexed website. This in rudimentary form is (tf-idf) extraction.

The search operator is…

site:yourdomain.com keyword[simply replace yourdomain.com and keyword] with the site your analyzing.

For example, to determine how many times the key phrase “business consulting” appears in cnn.com website for example, I would simply open a Google search bar and type in site:cnn.com business consulting and execute the search. Then, a list of relevant documents will populate that have the shingle/keyword or stemmed semantic variant (singular, plural or synonymous keyword)  in the title, in the meta description or on the page. Results are displayed in bold to show occurrence.

Pages that are weighted heavily through internal or external links will have the highest degree of prominence and authority and more than likely exhibit more exact match occurrences of the phrase you are querying.

They are thereby considered more relevant because of the co-occurrence of the keyword in tandem with multiple algorithms running concurrent within the search engines which check, cross-check and sift or extract data based on the weighing formulas used to determine which pages have the highest correlation.

This alone can be used for finding the best page for internal links or finding an ideal preferred landing page to consolidate page weight through link volume. So, instead of just linking to the homepage, I would use this command to shift ranking factor to a page that is specifically about that topic and would have a higher degree of being a relevant result for someone searching.

By optimizing your content you are helping search engines do their job. Not only does relevant linking increase the deep link percentage of that page and since search engines rank web pages not websites, it also allows you to have multiple pages returned from the same query if they all share a similar semantic thread or deem a particular page as the preferred landing page for a specific query. You can use a virtual theme (just use links) or employ a themed and siloed site architecture to reinforce relevance within your own website.

If you understand that term weighting, co-occurrence and deep link percentages alone can make any page in a website behave like a PPC landing page (you type in a query and it organically ranks for it), then you may want to delve into this fundamental formula with more verve and start running your own tests to conclude your own results.

Co-occurrence equates to ranking power when consolidated. If you have 100 pages of solid content on Topic A and your competitor has 3 pages of relevant content, which result do you think search engines would reward?

It’s not just about volume as spam detection and dated concepts of keyword stuffing no longer prevail in search engine algorithms. Yet, if you invest the time to produce unique, structured content and assign it a place in the hierarchy of your website (preferably themed), then it is only a matter of time before those pages transform into nodes of stability to use an anchor points for current and future rankings.

Having more pages in a search engines index translates into ranking insurance as SEO defense and can be utilized to insulate vital rankings. Each new page has the ability to attract its own links and stem into a landing page capable of ranking for dozens of keywords or it can be used to pass along its ranking factor to a preferred landing page (housing a more compelling value proposition).

Search engines use the vector space model and Inverse document frequency to score your pages. One example is the Glasgow Model which implements the process of normalization (determining validity), this process looks at (a) the frequency of a term in a document (b) maximum frequency of them in the document (c) the number of unique terms in the document (d) the number of documents in the collection and (e) the number of documents the term occurs. As a result, a website could gain a higher degree of trust which offsets less trusted of less robust websites where they keyword or key phrase has less saturation.

The real take away here is (1) realize that each keyword has a tipping point and (2) each page if the number of occurrences for that keyword is competitive with require a collection of supporting pages to cross the tipping point and bump other web pages aside to create visibility for your pages.

Wikipedia is a prime example of the model (a) topical continuity (b) site architecture (c) a degree of term frequency and co-occurrence between documents (d) deep links from other pages to the keywords specific landing page and (e) the fact that it is scalable and capable of devouring virtually any topic, keyword or key phrase.

For example, the word internet searched in Google has Wikipedia returned at the top position with a double listing. That specific broad match keyword has over 1.7 Million occurrences in Google’s index, but why Wikipedia would be the most relevant falls back on the algorithms mentioned above.

The main landing page for the word internet has 35,000 links from other websites to that page. This alone makes this page a contender as the amount of deep links required does have a tremendous impact on rankings, but that must also be mirrored within the website’s internal linking structure in order to be more effective. In addition to keyword prominence , the landing page ranks as a result of the synergy of off page and on page SEO factors to consolidate the position.

In this instance, there are another 50,000 internal links consolidating the ranking factor to that page all with the shingle/ anchor text “internet” that funnel link flow and relevance to the target page.

Domain authority and topical relevance also play a big role in the equation and we know Wikipedia has no shortage of that either. So, in order to rank for the word internet, you would have to scale a site capable of contending with that kind of on page of off page symmetry. As well as allow for the content to reach a natural plateau (through not pushing content or link velocity outside of natural bounds).

Applying this on a smaller scale allows you to see how favorable keywords can be elected and promoted to acquire a higher degree of relevance within a website and how through link building or viral promotion how authoritative links can elevate a page into the top ranking positions of search engines.

We know that search results displayed at the top of rankings get clicked, but it’s just a matter of getting there for optimal keywords that are relevant to your business. At least now, you have some solid research to fall back on so that you can engage each optimization campaign not only as a quest for a keyword, but more of an experiment in finding the appropriate percentages of the metrics that produce the weighting of terms in the index. 

Read More Related Posts
Using Video to Close Prospects
This blog post tops off my four-part “Now What?” series that has covered three critical components to sales conversions after SEO and increased traffic trends have been achieved: Call-to-action, sleek ...
READ MORE
Google PageRank Update December 2009
Last night, Google has passed out a little page rank love with a recent update to kick off the new Google caffeine (bigger, faster, smarter) search engine initiative. For many of ...
READ MORE
SEO Tips for Server Side Includes, Sitemaps and Internal and External Links
There are thresholds for indexation or rankings which directly correlate to how much link flow or link equity a web page receives from both (1) the site and (2) other ...
READ MORE
SEO, Is It All About Links? Not Anymore
With so many new algorithms being adjusted, modified or replaced altogether, between the cached version of the SERPs (search engine result pages) and the actual index there are obvious discrepancies leaving ...
READ MORE
SEO Ultimate Version 0.8 Now Features a "File Editor" Module
SEO Design Solutions is pleased to announce another release / module for the SEO Ultimate All-in-One SEO Plugin. SEO Ultimate 0.8 includes two new features: robots.txt editing and htaccess editing, ...
READ MORE
Applied SEO (In Theory and Application)
Resources are typically sparse when referring to SEO. For that reason, today's post serves as a summary of strategies for those who like to apply SEO rather than just read ...
READ MORE
Carbon or Silicon What Defines Intelligence?
Search engines keep a running keyword tally of which pages you have, who links to them, what they are about and how popular they are. If somewhere on some page ...
READ MORE
Finding the Right SEO Company for the Best SEO Services
Search engine optimization otherwise known as SEO, is comprised of multiple disciplines. Each has its roots in improving a websites performance by streamlining facets ranging from site and information architecture, ...
READ MORE
Link Building and Off Page Ranking Factors
After stumbling across a competitors attempt at link building, let's just say that now I know why Google enforces quality control in context to contrived off page SEO and off ...
READ MORE
SEO Videos: How to Discover Competitors SEO Strategies
Have you ever wanted to assess, discover or reveal your competitors most coveted SEO metrics? What if you could reveal the infrastructure behind their site architecture, traffic levels per keyword, ...
READ MORE
I Read Three Other Blog Posts….Now What?
Happy New Year’s Page Rank Update
SEO Tips for Internal Links, Deep Links and
SEO, Is It All About Links? Not Anymore
SEO Ultimate 0.8 Released!
Applied SEO (In Theory and Application)
What Do Search Engines Know, That We Don’t?
Finding the Right SEO Company for the Best
Link Building and Off Page SEO Ranking Factors
SEO Videos: How to Discover Competitors SEO Strategies

About Jeffrey_Smith

In 2006, Jeffrey Smith founded SEO Design Solutions (An SEO Provider who now develops SEO Software for WordPress).

Jeffrey has actively been involved in internet marketing since 1995 and brings a wealth of collective experiences and marketing strategies to increase rankings, revenue and reach.

11 thoughts on “SEO and Search Engine Algorithms
  1. thank u for ur information on term frequency

  2. Nice work people. Good work. Term frequency had been really confusing for me always but reading your blog got all clear Thanks

Comments are closed.