Thinking from the perspective of a computer science engineer, what is the fastest way to combat search engine spam and websites with low quality?, simply de-index them by raising the bar.
Everything leaves a trail, or rather a digital footprint. Things like the relationship of link clusters (when links were built/created/indexed), if the pages have a high percentage of the same shingles (groups of words i.e. the same content), if there are any supporting links or hub pages with authority worthy enough to add sufficient weight to the pages, etc.
Google is constantly raising the bar and their guidelines are clear, if you play in gray areas, then you could get penalized. However, so many tread on shaky ground in an attempt to find the limits.
As a result of search engineers adopting new methods to ensure quality, similarly those involved with SEO are forced to come up with even more detailed techniques to remain relevant and buoyant in the rankings amidst those changes.
One audit within the search algorithm could very well be responsible for de-indexing millions of pages. Consider it like a daisy chain, when one link is missing, the flow of link weight is kinked. If you have 10 pages in your site that provide 90% of the link weight for your pages, and for some reason if they fall out of favor with the algorithms new filter and get dampened or removed, all the rankings those pages supported are on hiatus.
Even now multiple data centers are showing vast fluctuations in the number of pages indexed for our domain, which is what led my to this conclusion initially (1180 pages in some centers, 1300 in others) and every page matters.
What this means to you is, entire ranges of keywords now are on page 3 instead of the top 10, search behavior remains the same (essentially those at the top get the high click through rate) and your profits tied to those keywords experience a loss.
Although there are no sure fire methods to insulate your website entirely from this phenomenon, there are things you can do to facilitate systematic redundancy to supplement times when the search engine index experiences a relapse and decides to start from scratch and re-index the web.
Without going too far off into theory, one shift in crawler behavior with one new rule that excludes pages that do not have the appropriate combination of indicators could be responsible for a major upheaval of the search engine result pages and how they are re-ordered.
Think of it as some results being grandfathered, others simply not making the grade. In either case, through using cyclical sweeps to take snapshots of the web to cross reference pages that are linked to from hubs or authority sites (which typically provide SERP shuffle/Google Dance immunity), search engines can provide layers of data to support who stays and who goes.
For examples a preliminary spider is sent for initial indexing and retrieval, a secondary sweep for another layer of evaluation and then a third sweep to compare the differences of the first two results. I am oversimplifying here for the sake of the concept, but you can see how shades of the supplemental index are making a comeback when thousands of references momentarily disappear for your website.
Although it is not the coined terms, I call it a cache relapse which is capable of erasing several months of hard work in just a few minutes after the new (lean) index goes live. Then, seeing search engine result pages waning until the index is rebuilt is only a natural side effect of this process.
Sure, hundreds of thousands of documents do not survive “The Google Shakedown”, or whichever search engine is conducting the audit. However, this is just one way to make room for new and improved metrics for information retrieval, relevance and the ideal user experience.
In closing, most pages that had authority prior to a sweep should rise back to their lofty positions prior to the audit. If they do not, then focus on raising relevance score through (a) creating quality topical content, which can age and support the rest of your site in 90-120 days (b) linking out to other authority sites in your niche to establish hub status (c) optimizing your internal links to reinforce theme prominence (what your site is about) for increased relevance. However, if you need an SEO company to aid the process, I am sure we can think of someone.