in SEO by Jeffrey_Smith

To understand SEO, you need to understand two things (1) the search behavior of motivated buyers and (2) what search engines deem relevant. Common sense coupled with insight to the concepts that structure how search results are parsed, weighted and rendered occur based on search engine algorithms and are based on information retrieval.

How do Term Frequency and Inverse Document Frequency Determine Relevance?

How do Term Frequency and Inverse Document Frequency Determine Relevance?

Studying the premise of (ir) can increase your understanding of why things rank the way they do. Metrics such as Probabilistic Term Reweighting, Local and Global Context Analysis, Pattern Matching through Phrases and Proximity all leave their indelible mark on search results each and every time a query is executed.

By familiarizing yourself with these metrics, you can replicate consistency across multiple metrics (optimization) to stabilize heuristic conclusions and produce tactical advantages to produce documents/pages that will achieve a high relevance score which directly equate to rankings / visibility / and conversion through commerce.

To gain more insight into what happens behind every search, there are two prominent metrics called term frequency (tf) and inverse document frequency (idf) or when used in tandem become (tf-idf). They are used as assembly nodes for measuring the probability/relevance score against queries to the index (a Google search) through using a parsing method known as an inverted list or inverted index.

Term frequency can be used to measure how many time a term appears in a document or also be used in context to determine relevance across multiple documents (like a metric of authority). In case you want to build your own search engine to test this out like a friend of mine whom I nickname professor, Mr. Tim Nash (information architect) to test variables, here is an introduction by Thanh Dao on Term frequency/Inverse document frequency implementation in C#

In case your busy and just don’t have the time, then you could opt for using a command to query Google which elaborates the relative volume and occurrence of a keyword or key phrase. This command when applied with other optimization metrics (such as deep link percentage (the number of links to that page from other sites), internal link percentage, domain authority / relevance) provides rather revealing  insight as to why a web page or site ranks or is weighted the way it is in the index.

This advanced Google search operator that is most commonly used to query the index specifically about the relationship / volume of saturation or occurrence a keyword has in context to the entire indexed website. This in rudimentary form is (tf-idf) extraction.

The search operator is… keyword[simply replace and keyword] with the site your analyzing.

For example, to determine how many times the key phrase “business consulting” appears in website for example, I would simply open a Google search bar and type in business consulting and execute the search. Then, a list of relevant documents will populate that have the shingle/keyword or stemmed semantic variant (singular, plural or synonymous keyword)  in the title, in the meta description or on the page. Results are displayed in bold to show occurrence.

Pages that are weighted heavily through internal or external links will have the highest degree of prominence and authority and more than likely exhibit more exact match occurrences of the phrase you are querying.

They are thereby considered more relevant because of the co-occurrence of the keyword in tandem with multiple algorithms running concurrent within the search engines which check, cross-check and sift or extract data based on the weighing formulas used to determine which pages have the highest correlation.

This alone can be used for finding the best page for internal links or finding an ideal preferred landing page to consolidate page weight through link volume. So, instead of just linking to the homepage, I would use this command to shift ranking factor to a page that is specifically about that topic and would have a higher degree of being a relevant result for someone searching.

By optimizing your content you are helping search engines do their job. Not only does relevant linking increase the deep link percentage of that page and since search engines rank web pages not websites, it also allows you to have multiple pages returned from the same query if they all share a similar semantic thread or deem a particular page as the preferred landing page for a specific query. You can use a virtual theme (just use links) or employ a themed and siloed site architecture to reinforce relevance within your own website.

If you understand that term weighting, co-occurrence and deep link percentages alone can make any page in a website behave like a PPC landing page (you type in a query and it organically ranks for it), then you may want to delve into this fundamental formula with more verve and start running your own tests to conclude your own results.

Co-occurrence equates to ranking power when consolidated. If you have 100 pages of solid content on Topic A and your competitor has 3 pages of relevant content, which result do you think search engines would reward?

It’s not just about volume as spam detection and dated concepts of keyword stuffing no longer prevail in search engine algorithms. Yet, if you invest the time to produce unique, structured content and assign it a place in the hierarchy of your website (preferably themed), then it is only a matter of time before those pages transform into nodes of stability to use an anchor points for current and future rankings.

Having more pages in a search engines index translates into ranking insurance as SEO defense and can be utilized to insulate vital rankings. Each new page has the ability to attract its own links and stem into a landing page capable of ranking for dozens of keywords or it can be used to pass along its ranking factor to a preferred landing page (housing a more compelling value proposition).

Search engines use the vector space model and Inverse document frequency to score your pages. One example is the Glasgow Model which implements the process of normalization (determining validity), this process looks at (a) the frequency of a term in a document (b) maximum frequency of them in the document (c) the number of unique terms in the document (d) the number of documents in the collection and (e) the number of documents the term occurs. As a result, a website could gain a higher degree of trust which offsets less trusted of less robust websites where they keyword or key phrase has less saturation.

The real take away here is (1) realize that each keyword has a tipping point and (2) each page if the number of occurrences for that keyword is competitive with require a collection of supporting pages to cross the tipping point and bump other web pages aside to create visibility for your pages.

Wikipedia is a prime example of the model (a) topical continuity (b) site architecture (c) a degree of term frequency and co-occurrence between documents (d) deep links from other pages to the keywords specific landing page and (e) the fact that it is scalable and capable of devouring virtually any topic, keyword or key phrase.

For example, the word internet searched in Google has Wikipedia returned at the top position with a double listing. That specific broad match keyword has over 1.7 Million occurrences in Google’s index, but why Wikipedia would be the most relevant falls back on the algorithms mentioned above.

The main landing page for the word internet has 35,000 links from other websites to that page. This alone makes this page a contender as the amount of deep links required does have a tremendous impact on rankings, but that must also be mirrored within the website’s internal linking structure in order to be more effective. In addition to keyword prominence , the landing page ranks as a result of the synergy of off page and on page SEO factors to consolidate the position.

In this instance, there are another 50,000 internal links consolidating the ranking factor to that page all with the shingle/ anchor text “internet” that funnel link flow and relevance to the target page.

Domain authority and topical relevance also play a big role in the equation and we know Wikipedia has no shortage of that either. So, in order to rank for the word internet, you would have to scale a site capable of contending with that kind of on page of off page symmetry. As well as allow for the content to reach a natural plateau (through not pushing content or link velocity outside of natural bounds).

Applying this on a smaller scale allows you to see how favorable keywords can be elected and promoted to acquire a higher degree of relevance within a website and how through link building or viral promotion how authoritative links can elevate a page into the top ranking positions of search engines.

We know that search results displayed at the top of rankings get clicked, but it’s just a matter of getting there for optimal keywords that are relevant to your business. At least now, you have some solid research to fall back on so that you can engage each optimization campaign not only as a quest for a keyword, but more of an experiment in finding the appropriate percentages of the metrics that produce the weighting of terms in the index. 

Read More Related Posts
Applied SEO (In Theory and Application)
Resources are typically sparse when referring to SEO. For that reason, today's post serves as a summary of strategies for those who like to apply SEO rather than just read ...
Simple But Effective Link Building Technique
Here is a simple, but effective SEO tip that leverages the power of related search for link building campaigns. It’s no secret that Google’s search engine algorithms are programmed to ...
Embracing SEO
Search engine optimization also known as SEO embraces multiple facets. Each facet is holistically integral to the collective aggregate that comprises a website. Through a plethora of tactics, techniques, timing ...
Which SEO Metrics Matter Most and Why?
What SEO metrics pass the most weight? Search Engine Optimization (SEO) is all about balance within a virtual ecosystem of variables. If one recedes, the infrastructure which supports ...
Anyone Can Pay for Traffic, but Who Can Earn it?
As you gathered from the title, the topic of traffic (paid or organic) is the subject of today's post. Let’s look at the pros and the cons of both SEO ...
Two Results are Better than One
The only thing better than one search result in the top 3 positions in Google is two search results from a double ranking.  This SEO tip works from pushing ...
SEO Best Practices: Search Engine Marketing & Promotion
Rather than make this an SEO how-to manual, we chose to provide a "big picture" view of which steps are critical for launching a successful website in today's ...
Google's Love Affair with Authority Sites
In short, Google's love affair with authority sites is based on the premise of reciprocity, relevance and commercial viability. It’s no secret, that Google’s search engine algorithms are stacked in ...
SEO Tips to Take Usability & Design to The Next Level
What is the point of driving traffic to a website if you send them away from bad design once they get there? Sure, SEO is important but so is usability, ...
What does the word Optimization mean to you? Although personally, we may have differing opinions about what the term infers, knowing exactly what it means to search engines is ...
Applied SEO (In Theory and Application)
Link Building Tips to Prevent Google Penalties
Embracing SEO
Which SEO Metrics Carry the Most Weight?
Anyone Can Pay for Traffic, but Who Can
SEO Tips to Double Rankings, Traffic and Conversion
SEO Best Practices: Search Engine Marketing & Promotion
Google’s Loves Affair With Authority Sites
SEO Tips to Take Usability & Design to
Semantic Optimization of Keywords for Organic SEO

About Jeffrey_Smith

In 2006, Jeffrey Smith founded SEO Design Solutions (An SEO Provider who now develops SEO Software for WordPress).

Jeffrey has actively been involved in internet marketing since 1995 and brings a wealth of collective experiences and marketing strategies to increase rankings, revenue and reach.

11 thoughts on “SEO and Search Engine Algorithms
  1. thank u for ur information on term frequency

  2. Nice work people. Good work. Term frequency had been really confusing for me always but reading your blog got all clear Thanks

Comments are closed.