To understand SEO, you need to understand two things (1) the search behavior of motivated buyers and (2) what search engines deem relevant. Common sense coupled with insight to the concepts that structure how search results are parsed, weighted and rendered occur based on search engine algorithms and are based on information retrieval.
Studying the premise of (ir) can increase your understanding of why things rank the way they do. Metrics such as Probabilistic Term Reweighting, Local and Global Context Analysis, Pattern Matching through Phrases and Proximity all leave their indelible mark on search results each and every time a query is executed.
By familiarizing yourself with these metrics, you can replicate consistency across multiple metrics (optimization) to stabilize heuristic conclusions and produce tactical advantages to produce documents/pages that will achieve a high relevance score which directly equate to rankings / visibility / and conversion through commerce.
To gain more insight into what happens behind every search, there are two prominent metrics called term frequency (tf) and inverse document frequency (idf) or when used in tandem become (tf-idf). They are used as assembly nodes for measuring the probability/relevance score against queries to the index (a Google search) through using a parsing method known as an inverted list or inverted index.
Term frequency can be used to measure how many time a term appears in a document or also be used in context to determine relevance across multiple documents (like a metric of authority). In case you want to build your own search engine to test this out like a friend of mine whom I nickname professor, Mr. Tim Nash (information architect) to test variables, here is an introduction by Thanh Dao on Term frequency/Inverse document frequency implementation in C#
In case your busy and just don’t have the time, then you could opt for using a command to query Google which elaborates the relative volume and occurrence of a keyword or key phrase. This command when applied with other optimization metrics (such as deep link percentage (the number of links to that page from other sites), internal link percentage, domain authority / relevance) provides rather revealing insight as to why a web page or site ranks or is weighted the way it is in the index.
This advanced Google search operator that is most commonly used to query the index specifically about the relationship / volume of saturation or occurrence a keyword has in context to the entire indexed website. This in rudimentary form is (tf-idf) extraction.
The search operator is…
site:yourdomain.com keyword[simply replace yourdomain.com and keyword] with the site your analyzing.
For example, to determine how many times the key phrase “business consulting” appears in cnn.com website for example, I would simply open a Google search bar and type in site:cnn.com business consulting and execute the search. Then, a list of relevant documents will populate that have the shingle/keyword or stemmed semantic variant (singular, plural or synonymous keyword) in the title, in the meta description or on the page. Results are displayed in bold to show occurrence.
Pages that are weighted heavily through internal or external links will have the highest degree of prominence and authority and more than likely exhibit more exact match occurrences of the phrase you are querying.
They are thereby considered more relevant because of the co-occurrence of the keyword in tandem with multiple algorithms running concurrent within the search engines which check, cross-check and sift or extract data based on the weighing formulas used to determine which pages have the highest correlation.
This alone can be used for finding the best page for internal links or finding an ideal preferred landing page to consolidate page weight through link volume. So, instead of just linking to the homepage, I would use this command to shift ranking factor to a page that is specifically about that topic and would have a higher degree of being a relevant result for someone searching.
By optimizing your content you are helping search engines do their job. Not only does relevant linking increase the deep link percentage of that page and since search engines rank web pages not websites, it also allows you to have multiple pages returned from the same query if they all share a similar semantic thread or deem a particular page as the preferred landing page for a specific query. You can use a virtual theme (just use links) or employ a themed and siloed site architecture to reinforce relevance within your own website.
If you understand that term weighting, co-occurrence and deep link percentages alone can make any page in a website behave like a PPC landing page (you type in a query and it organically ranks for it), then you may want to delve into this fundamental formula with more verve and start running your own tests to conclude your own results.
Co-occurrence equates to ranking power when consolidated. If you have 100 pages of solid content on Topic A and your competitor has 3 pages of relevant content, which result do you think search engines would reward?
It’s not just about volume as spam detection and dated concepts of keyword stuffing no longer prevail in search engine algorithms. Yet, if you invest the time to produce unique, structured content and assign it a place in the hierarchy of your website (preferably themed), then it is only a matter of time before those pages transform into nodes of stability to use an anchor points for current and future rankings.
Having more pages in a search engines index translates into ranking insurance as SEO defense and can be utilized to insulate vital rankings. Each new page has the ability to attract its own links and stem into a landing page capable of ranking for dozens of keywords or it can be used to pass along its ranking factor to a preferred landing page (housing a more compelling value proposition).
Search engines use the vector space model and Inverse document frequency to score your pages. One example is the Glasgow Model which implements the process of normalization (determining validity), this process looks at (a) the frequency of a term in a document (b) maximum frequency of them in the document (c) the number of unique terms in the document (d) the number of documents in the collection and (e) the number of documents the term occurs. As a result, a website could gain a higher degree of trust which offsets less trusted of less robust websites where they keyword or key phrase has less saturation.
The real take away here is (1) realize that each keyword has a tipping point and (2) each page if the number of occurrences for that keyword is competitive with require a collection of supporting pages to cross the tipping point and bump other web pages aside to create visibility for your pages.
Wikipedia is a prime example of the model (a) topical continuity (b) site architecture (c) a degree of term frequency and co-occurrence between documents (d) deep links from other pages to the keywords specific landing page and (e) the fact that it is scalable and capable of devouring virtually any topic, keyword or key phrase.
For example, the word internet searched in Google has Wikipedia returned at the top position with a double listing. That specific broad match keyword has over 1.7 Million occurrences in Google’s index, but why Wikipedia would be the most relevant falls back on the algorithms mentioned above.
The main landing page for the word internet has 35,000 links from other websites to that page. This alone makes this page a contender as the amount of deep links required does have a tremendous impact on rankings, but that must also be mirrored within the website’s internal linking structure in order to be more effective. In addition to keyword prominence , the landing page ranks as a result of the synergy of off page and on page SEO factors to consolidate the position.
In this instance, there are another 50,000 internal links consolidating the ranking factor to that page all with the shingle/ anchor text “internet” that funnel link flow and relevance to the target page.
Domain authority and topical relevance also play a big role in the equation and we know Wikipedia has no shortage of that either. So, in order to rank for the word internet, you would have to scale a site capable of contending with that kind of on page of off page symmetry. As well as allow for the content to reach a natural plateau (through not pushing content or link velocity outside of natural bounds).
Applying this on a smaller scale allows you to see how favorable keywords can be elected and promoted to acquire a higher degree of relevance within a website and how through link building or viral promotion how authoritative links can elevate a page into the top ranking positions of search engines.
We know that search results displayed at the top of rankings get clicked, but it’s just a matter of getting there for optimal keywords that are relevant to your business. At least now, you have some solid research to fall back on so that you can engage each optimization campaign not only as a quest for a keyword, but more of an experiment in finding the appropriate percentages of the metrics that produce the weighting of terms in the index.
thank u for ur information on term frequency
Nice work people. Good work. Term frequency had been really confusing for me always but reading your blog got all clear Thanks