Tech SEO Connect

Accounting for Gaps in SEO Software

Featuring Mike King

Play Video

Time Stamps:

  • [0:58] SEO Content Editing Tools are Deficient
  • [2:03] Phrase-Based Indexing and AI Overviews
  • [3:37] Google’s Search Explanation Feature
  • [5:57] “Python SEO” Exists Due to SEO Software Failings
  • [6:29] Search Engine Fundamentals: Vector Space Model
  • [8:54] From Lexical to Semantic: The Evolution of Google Search
  • [9:15] Word2Vec and the Rise of Semantic Analysis
  • [11:48] The Transformer, BERT, and Hybrid Retrieval
  • [13:59] Dense Retrieval and Aspect-Level Scoring
  • [15:31] E-E-A-T is Mathematically Represented
  • [17:25] Google Prioritizes User Experience
  • [20:35] Click Data is Key to Google’s Success
  • [26:06] SEO Needs New Metrics
  • [28:51] Leaks Reveal More Data to Consider
  • [34:53] SEO Needs Technical Standards
  • [36:41] Data Should Be Free and Better
Mike King explores the limitations of SEO software and the importance of user behavior and semantic understanding in Google’s algorithms.
 
Learn how concepts like vector embeddings and clickstream data analysis can enhance your SEO strategies.
 
Discover key insights from recent Google leaks, and join the discussion on the need for better data access and technical standards in the SEO community.
 
 

Don’t forget to please SUBSCRIBE on YouTube if you enjoy the webinar.

How Should SEO Software catch up to Google?

Mike King believes that SEO professionals need to better understand user behavior, leverage concepts like vector embeddings, and analyze clickstream data to improve their SEO strategies because most SEO software is deficient and doesn’t account for how Google’s algorithms actually work.

SEO Content Editing Tools are Deficient

Mike argues that many SEO content editing tools are fundamentally flawed because they don’t account for how Google actually processes and ranks content. They often focus solely on optimizing for a single target keyword, which is a limited approach.
 
He cites Searchmetrics Content Experience as a tool that gets it right by optimizing based on a graph of related keywords, similar to Google’s concept of phrase-based indexing.
 
This broader approach reflects Google’s understanding of content within a topical cluster, considering co-occurring terms and semantic relationships between keywords.

Phrase-Based Indexing and AI Overviews

Mike emphasizes that Google looks beyond just the target keyword when ranking content. He points to the phrase-based indexing patent and the AI Overview feature as evidence. These aspects of Google’s algorithms consider related keywords and co-occurring terms on pages to determine their relevance to a query. Unfortunately, many SEO tools still rely heavily on TF-IDF, a metric that merely measures keyword frequency without considering the semantic relationships between them.

Google's Search Explanation Feature

He highlights the value of Google’s “Search Explanation” feature found in search results for specific listings. This feature offers insights into why particular results rank for specific queries, going beyond just basic keyword matching. It provides both lexical and semantic reasons for a page’s ranking, revealing which search terms are present and how they relate to the query. SEOs can leverage this information to optimize their content more effectively, aligning it with Google’s understanding of relevance.

"Python SEO" Exists Due to SEO Software Failings

King observes that the rise of “Python SEO” is a direct consequence of the limitations of existing SEO software. Many SEOs turn to Python scripting to gather data and perform analysis because commercially available tools often lack the depth and sophistication required to understand Google’s increasingly complex algorithms. This reliance on custom solutions highlights a gap in the SEO software market, calling for more advanced tools capable of analyzing semantic relationships and user behavior signals.

Search Engine Fundamentals: Vector Space Model

Mike explains the foundational concept behind search engine operation: the vector space model.
 
In this model, both documents and search queries are represented as vectors in a multi-dimensional space. Each dimension corresponds to a different feature or concept. The closer a document vector is to a query vector in this space, the more relevant the document is considered. This model moves away from simple keyword matching, enabling search engines to understand the semantic relationships between words and concepts.

From Lexical to Semantic: The Evolution of Google Search

Mike traces the evolution of Google Search from a primarily lexical model, relying heavily on keyword frequency (think keyword stuffing), to a semantic model focused on understanding the meaning and context of words.
 
This transition, marked by the Hummingbird update over a decade ago, significantly impacted SEO practices, demanding a shift from keyword-centric optimization to creating content that truly addresses user intent and provides valuable information.

Word2Vec and the Rise of Semantic Analysis

Mike dives into the groundbreaking innovation of Word2Vec, a technique developed by Tomas Mikolov and Jeff Dean, which introduced the concept of vector embeddings.
 
Word2Vec represents words as coordinates in a multi-dimensional space, capturing semantic relationships between them. Words with similar meanings are located closer together in this space, allowing Google to understand synonyms, related concepts, and even analogies.
 
This marked a significant advancement in natural language processing, paving the way for more sophisticated semantic analysis in search.

The Transformer, BERT, and Hybrid Retrieval

Mike discusses the emergence of the Transformer, a powerful deep learning model invented by Noam Shazeer, which further revolutionized natural language processing.
 
The Transformer architecture became the foundation for BERT (Bidirectional Encoder Representations from Transformers), a model that goes beyond Word2Vec by understanding context.
 
BERT can differentiate between words with multiple meanings based on the surrounding words in a sentence, significantly improving Google’s ability to understand natural language. This led Google to adopt a hybrid retrieval model, combining both lexical and semantic analysis to deliver more relevant and comprehensive search results.

Dense Retrieval and Aspect-Level Scoring

Mike explains the concept of dense retrieval, which enables Google to go beyond traditional document-level scoring and assess relevance at a much finer granularity, even down to individual sentences or paragraphs.
 
This is based on the idea of multi-aspect dense retrieval, where multiple embeddings represent different facets of both the query and the document. This allows Google to pinpoint the most relevant sections of a page to a user’s query, leading to more precise and informative search results.

E-E-A-T is Mathematically Represented

Mike argues that E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness), a key concept in Google’s Quality Raters Guidelines, is not just a qualitative assessment but is mathematically represented in the algorithm.
 
This is achieved through website and author embeddings, which provide Google with a way to quantify the topical focus of a website and the expertise of an author. By analyzing these embeddings, Google can assess whether a page aligns with a website’s overall theme and whether the author is recognized as knowledgeable in the subject matter.

Google Prioritizes User Experience

Mike emphasizes that Google prioritizes user experience, and this is reflected in its ranking algorithms.
 
This user-centric approach is supported by leaked documents and the DOJ antitrust trial, which confirmed Google’s use of user behavior signals like click data to re-rank search results. This contradicts Google’s past statements about using click data only for evaluation and experimentation, highlighting the importance of optimizing for factors that enhance user engagement and satisfaction.

Click Data is Key to Google's Success

Mike asserts that Google’s massive volume of click data contributes significantly to its search dominance.
 
Metrics like “last longest click” (the time spent on a page before returning to search results) and “nav fraction” (expected CTR) play a key role in how Google ranks pages. These signals provide insights into user satisfaction and engagement with search results. This further reinforces the need for SEOs to focus on creating content that satisfies user intent and encourages longer dwell times on pages.

SEO Needs New Metrics

Mike contends that the SEO industry needs to embrace new metrics that go beyond traditional factors like keyword rankings and backlinks. He proposes metrics like “site focus score,” which measures the alignment of individual pages with a website’s overall topical theme using site embeddings, and “content potential rating,” which assesses the likelihood of a page to rank well based on its content and keyword opportunities.
 
These metrics align with Google’s increasing emphasis on user experience, semantic understanding, and content quality.

Leaks Reveal More Data to Consider

Mike discusses the insights gained from recent Google leaks, revealing additional data points Google considers in its ranking algorithms. These leaks provide a glimpse into the complex interplay of factors that influence search rankings, including social media signals, news scores, a medical classifier, and various metrics related to People Also Ask (PAA) features.
 
Analyzing these leaks can help SEOs understand the evolving priorities of Google’s algorithms and adjust their strategies accordingly.

SEO Needs Technical Standards

Mike advocates for the establishment of technical standards within the SEO industry, particularly in data formats and APIs. He criticizes the lack of consistency across different SEO tools, making it difficult to compare data and switch platforms without significant manual effort.
 
Establishing clear standards for data portability and interoperability would benefit the entire SEO community, enabling seamless integration of data from various sources and facilitating more informed decision-making.

Data Should Be Free and Better

Mike concludes by expressing his belief that crucial SEO data, like rankings and link data, should be freely accessible. He argues that the technical barrier to collecting and storing this data is not insurmountable, as demonstrated by projects like Majestic’s distributed crawling network. He proposes an “Open Search Initiative,” a community-driven project inspired by Google’s Spanner architecture, to build a distributed database of SEO data, providing free and open access to valuable information that can benefit the entire industry.

About the Presenter

Mike King | Founder and CEO | iPullRank

Mike King is the Founder and CEO of iPullRank. Deeply technical and highly creative, Mike has helped generate over $4B in revenue for his clients. A rapper and recovering big agency guy, Mike’s greatest clients are his two daughters: Zora and Glory.

SEO CODE RED

Will Google’s AI Overviews (SGE) tank your organic traffic?

INCLUDED IN THE AI OVERVIEWS THREAT REPORT:

  • Threat Level by Snapshot Type: Local, eCommerce, Desktop, and Mobile will have a different impact on your organic traffic. Which types are appearing for your keywords?
  • Threat Level by Result Position Distribution: Your visibility within AI Overviews can indicate the likelihood of organic traffic. Where in the Snapshot do your links appear?
  • Automatic AI Overviews vs Click to Generate: Will your audience be force-fed AI Overviews or will that extra click to generate the snapshot save you from missing out on traffic?
  • Threat Level by Snapshot Speed: Slow AI Overviews load times might not impact your organic visibility. How fast does AI Overviews load?

👋🏿 Before you go...

Sign up for The Rank Report Newsletter

🏆 Loved by over 4000 subscribers who pull rank