in SEO by Jeffrey_Smith

To understand SEO, you need to understand two things (1) the search behavior of motivated buyers and (2) what search engines deem relevant. Common sense coupled with insight to the concepts that structure how search results are parsed, weighted and rendered occur based on search engine algorithms and are based on information retrieval.

How do Term Frequency and Inverse Document Frequency Determine Relevance?

How do Term Frequency and Inverse Document Frequency Determine Relevance?

Studying the premise of (ir) can increase your understanding of why things rank the way they do. Metrics such as Probabilistic Term Reweighting, Local and Global Context Analysis, Pattern Matching through Phrases and Proximity all leave their indelible mark on search results each and every time a query is executed.

By familiarizing yourself with these metrics, you can replicate consistency across multiple metrics (optimization) to stabilize heuristic conclusions and produce tactical advantages to produce documents/pages that will achieve a high relevance score which directly equate to rankings / visibility / and conversion through commerce.

To gain more insight into what happens behind every search, there are two prominent metrics called term frequency (tf) and inverse document frequency (idf) or when used in tandem become (tf-idf). They are used as assembly nodes for measuring the probability/relevance score against queries to the index (a Google search) through using a parsing method known as an inverted list or inverted index.

Term frequency can be used to measure how many time a term appears in a document or also be used in context to determine relevance across multiple documents (like a metric of authority). In case you want to build your own search engine to test this out like a friend of mine whom I nickname professor, Mr. Tim Nash (information architect) to test variables, here is an introduction by Thanh Dao on Term frequency/Inverse document frequency implementation in C#

In case your busy and just don’t have the time, then you could opt for using a command to query Google which elaborates the relative volume and occurrence of a keyword or key phrase. This command when applied with other optimization metrics (such as deep link percentage (the number of links to that page from other sites), internal link percentage, domain authority / relevance) provides rather revealing  insight as to why a web page or site ranks or is weighted the way it is in the index.

This advanced Google search operator that is most commonly used to query the index specifically about the relationship / volume of saturation or occurrence a keyword has in context to the entire indexed website. This in rudimentary form is (tf-idf) extraction.

The search operator is… keyword[simply replace and keyword] with the site your analyzing.

For example, to determine how many times the key phrase “business consulting” appears in website for example, I would simply open a Google search bar and type in business consulting and execute the search. Then, a list of relevant documents will populate that have the shingle/keyword or stemmed semantic variant (singular, plural or synonymous keyword)  in the title, in the meta description or on the page. Results are displayed in bold to show occurrence.

Pages that are weighted heavily through internal or external links will have the highest degree of prominence and authority and more than likely exhibit more exact match occurrences of the phrase you are querying.

They are thereby considered more relevant because of the co-occurrence of the keyword in tandem with multiple algorithms running concurrent within the search engines which check, cross-check and sift or extract data based on the weighing formulas used to determine which pages have the highest correlation.

This alone can be used for finding the best page for internal links or finding an ideal preferred landing page to consolidate page weight through link volume. So, instead of just linking to the homepage, I would use this command to shift ranking factor to a page that is specifically about that topic and would have a higher degree of being a relevant result for someone searching.

By optimizing your content you are helping search engines do their job. Not only does relevant linking increase the deep link percentage of that page and since search engines rank web pages not websites, it also allows you to have multiple pages returned from the same query if they all share a similar semantic thread or deem a particular page as the preferred landing page for a specific query. You can use a virtual theme (just use links) or employ a themed and siloed site architecture to reinforce relevance within your own website.

If you understand that term weighting, co-occurrence and deep link percentages alone can make any page in a website behave like a PPC landing page (you type in a query and it organically ranks for it), then you may want to delve into this fundamental formula with more verve and start running your own tests to conclude your own results.

Co-occurrence equates to ranking power when consolidated. If you have 100 pages of solid content on Topic A and your competitor has 3 pages of relevant content, which result do you think search engines would reward?

It’s not just about volume as spam detection and dated concepts of keyword stuffing no longer prevail in search engine algorithms. Yet, if you invest the time to produce unique, structured content and assign it a place in the hierarchy of your website (preferably themed), then it is only a matter of time before those pages transform into nodes of stability to use an anchor points for current and future rankings.

Having more pages in a search engines index translates into ranking insurance as SEO defense and can be utilized to insulate vital rankings. Each new page has the ability to attract its own links and stem into a landing page capable of ranking for dozens of keywords or it can be used to pass along its ranking factor to a preferred landing page (housing a more compelling value proposition).

Search engines use the vector space model and Inverse document frequency to score your pages. One example is the Glasgow Model which implements the process of normalization (determining validity), this process looks at (a) the frequency of a term in a document (b) maximum frequency of them in the document (c) the number of unique terms in the document (d) the number of documents in the collection and (e) the number of documents the term occurs. As a result, a website could gain a higher degree of trust which offsets less trusted of less robust websites where they keyword or key phrase has less saturation.

The real take away here is (1) realize that each keyword has a tipping point and (2) each page if the number of occurrences for that keyword is competitive with require a collection of supporting pages to cross the tipping point and bump other web pages aside to create visibility for your pages.

Wikipedia is a prime example of the model (a) topical continuity (b) site architecture (c) a degree of term frequency and co-occurrence between documents (d) deep links from other pages to the keywords specific landing page and (e) the fact that it is scalable and capable of devouring virtually any topic, keyword or key phrase.

For example, the word internet searched in Google has Wikipedia returned at the top position with a double listing. That specific broad match keyword has over 1.7 Million occurrences in Google’s index, but why Wikipedia would be the most relevant falls back on the algorithms mentioned above.

The main landing page for the word internet has 35,000 links from other websites to that page. This alone makes this page a contender as the amount of deep links required does have a tremendous impact on rankings, but that must also be mirrored within the website’s internal linking structure in order to be more effective. In addition to keyword prominence , the landing page ranks as a result of the synergy of off page and on page SEO factors to consolidate the position.

In this instance, there are another 50,000 internal links consolidating the ranking factor to that page all with the shingle/ anchor text “internet” that funnel link flow and relevance to the target page.

Domain authority and topical relevance also play a big role in the equation and we know Wikipedia has no shortage of that either. So, in order to rank for the word internet, you would have to scale a site capable of contending with that kind of on page of off page symmetry. As well as allow for the content to reach a natural plateau (through not pushing content or link velocity outside of natural bounds).

Applying this on a smaller scale allows you to see how favorable keywords can be elected and promoted to acquire a higher degree of relevance within a website and how through link building or viral promotion how authoritative links can elevate a page into the top ranking positions of search engines.

We know that search results displayed at the top of rankings get clicked, but it’s just a matter of getting there for optimal keywords that are relevant to your business. At least now, you have some solid research to fall back on so that you can engage each optimization campaign not only as a quest for a keyword, but more of an experiment in finding the appropriate percentages of the metrics that produce the weighting of terms in the index. 

Read More Related Posts
Converting SEO Bounce Rates to Clients
What percentage of visitors is leaving your website because they cannot find what they want? Aside from SEO and driving traffic to a website, what happens after they arrive is ...
I noticed the strangest occurrences today in the SERPS (search engine result pages). It all started last night about 7PM or so, just checking a few rankings, my favorite keyword ...
Landing Pages and the Value of Propositions
During the several hours of perusing the internet each day, I come across many value propositions. Like most web surfers, I can't help but try my luck at winning a ...
Traffic Sources, Vistors and Conversion
With all of the marketing jargon about sales funnels, sales cycles, message matching and utilizing enticing value propositions to encourage click throughs and conversions its easy to forget what proceeds ...
How stable are your off-page SEO efforts and how dependent is your website on other websites for rankings? Recently, Google performed an algorithmic update that functions like a chopping block ...
The primary reason why people use search engines is to find answers.  Therefore, your websites success depends on whether or not you fully understand the questions people are searching for ...
READ MORE ads Trend Data
If you are engaged in SEO and curious about which keywords are driving the most traffic to your website in Google, the SEO Quake team who brought you SEO Digger ...
Monetizing Traffic using Affiliate Marketing
The lines between cause, effect and tactful causation are blurred in context to SEO and promoting products and services with affiliate marketing. The web is a conglomerate of intersecting paths ...
For those unfamiliar with SEO, the title tag is a primary method to communicate and convey the subject and topic of your page. In addition to the title appearing in ...
SEO Ultimate Allows You to Optimize and "Mass Edit" Pages and Post Titles
The ultimate SEO suite for WordPress has just been upgraded with a powerful mass-editor for the <title> tags of posts and pages. SEO Ultimate's already-powerful Title Rewriter module has been revamped ...
Converting SEO Bounce Rates to Clients
Website Positioning and the Ebb and Tide of
Landing Pages: The Value of Propositions
Alternative Traffic Sources: Diversity and Conversion
The Domino Effect: Liberation from Link Dependency
3 Painful Keyword Research Pitfalls to Avoid!
SEO Tool SEM Rush Adds New Keyword Trend
Affiliate SEO Marketing and Affiliate Landing Pages
Optimizing Titles, Anchor Text and Links
SEO Ultimate WordPress SEO Plugin Version 1.1 Released

About Jeffrey_Smith

In 2006, Jeffrey Smith founded SEO Design Solutions (An SEO Provider who now develops SEO Software for WordPress).

Jeffrey has actively been involved in internet marketing since 1995 and brings a wealth of collective experiences and marketing strategies to increase rankings, revenue and reach.

11 thoughts on “SEO and Search Engine Algorithms
  1. thank u for ur information on term frequency

  2. Nice work people. Good work. Term frequency had been really confusing for me always but reading your blog got all clear Thanks

Comments are closed.