Benchmarking Brand Discoverability: LLMs vs. Traditional Search | JR Oakes

Benchmarking Brand Discover-ability:

LLMs vs. Traditional Search

By JR Oakes
VP of Strategy at LOCOMOTIVE

JR used a massive dataset of brand mentions to compare how LLMs and traditional search engines like Google surface B2B companies. He found that while LLMs are catching up fast, Google still leads in visibility, especially for lesser-known brands, making SEO fundamentals more relevant than ever in the AI era.

Is FOMO hitting you hard after Missing SEO Week 2025? It's not too late to attend in 2026.

SEO Week 2025 set the bar with four themed days, top-tier speakers, and an unforgettable experience. For 2026, expect even more: more amazing after parties, more activations like AI photo booths, barista-crafted coffee, relaxing massages, and of course, the industry’s best speakers. Don’t miss out. Spots fill fast.

ABOUT JR Oakes

JR is a leading technical SEO expert. His unique career path led from architectural glass art to web development and SEO. Known for innovative approaches to technical SEO, data science, and automation, JR speaks at industry conferences and contributes to major publications. He’s actively involved in the SEO community, organizing meetups and co-founding the Tech SEO Connect Conference. JR’s interests include emerging tech, bass guitar, and college basketball.

OVERVIEW

JR’s SEO Week talk explores how large language models (LLMs) and traditional search engines differ in surfacing brand visibility across B2B SaaS industries. Using 1.9 million brand mentions across 57,000 companies, he compares brand exposure in platforms like Google, ChatGPT, and Claude, showing how authority signals like domain rating and content depth affect discoverability in both search and LLMs.

He highlights that while LLMs are improving rapidly, Google still dominates in surfacing brand relevance, especially for smaller companies, making it critical to measure where and how your brand appears in AI-driven results.

DOWNLOAD THE DECK

Talk
Highlights

LLMs vs. Google: Not the Same:
While LLMs and Google often return similar brand results, Google still dominates in surface-level discoverability, especially for lesser-known brands. LLMs lag in freshness and visibility, making traditional SEO practices more critical than ever.

Modifiers Are Brand Fingerprints:
By analyzing 63 search modifiers, JR illustrated how brands show up differently across models. These “modifier fingerprints” reveal how LLMs interpret a brand’s value (e.g., affordability, trustworthiness) and can help marketers track brand perception over time.

SEO Metrics Still Matter:
Core SEO metrics like domain authority and content volume correlate strongly with visibility in both Google and LLMs. The overlap suggests that investing in high-quality, authoritative content pays off in both search and AI-driven experiences.

Presentation Snackable

Is FOMO hitting you hard after Missing SEO Week 2025? It's not too late to attend in 2026.

Transcript

Mike King: A gentleman by the name of JR Oakes. Yes. Is like when I see him post stuff, I get jealous. I’m like why didn’t we come up with that? And JR, actually worked with him and a gentleman named Patrick Stocks, another gentleman by the name of Matthew Kay. And we throw another conference called Tech SEO connect in the Raleigh, North Carolina, area. So if you want more of this, come to that in December. But yeah, back to JR Oakes. JR is the VP of Strategy at Locomotive, leveraging 10 years of SEO expertise to scale business success. He’s a former architectural glass artist with notable installations for SAS, US Army and Coach K. JR now uses creativity to fuel everybody okay? Strategic growth. He’s a tech enthusiast and coder, always exploring new dark beers, jamming on base, and enjoying life with his family. Presenting Benchmarking Brand Discoverability, LLMs Versus Traditional Search, please welcome JR Oakes.

JR Oakes: Thank you. I’m gonna need that again. How are y’all? I thought Mike said that the the production value is gonna be high here. I don’t know. Amazing. My name is JR Oakes. I am the VP of Strategy at Locomotive. This is this image of me is a little old here. Mike, thank you so much for inviting and for everybody behind us. I know it’s a lot of work and it’s amazing. At Locomotive, we do a lot of open source. Here’s three projects we’ve worked on right here. Taxonomy, the latest one is called Crawling Chat. It’s cool because it’s a complete rag system in a box. You can stick in your XML site map and it will push out API for GPTs or MCP’s. We really like pushing out concepts as code. So when I started thinking of something for SEO Week, I really wanted to kinda do something different. And I was seeing a lot of people thinking about like rank tracking and language models in the same way and I don’t think they are. I I think of the concept of like a language model saying, domain one or brand one is really terrible, but brand two and three are really great. Right? So if you’re number one, you’re not in too good of a space there. So I’m not sure that they exactly fit.

So I started thinking about the idea of just like discoverability. Of if I have a brand and I’m thinking about like where people are gonna find me and get me into the set of brands that they’re gonna be researching for for procurement or whatever. Then I wanna think about like how I’m gonna be found. So here’s the key to this. We’ll start first with G2. I’m sure everybody knows what G2, it’s a software SaaS review site. G2 was really interesting for me because it’s kind of a homogenous set of the same category of intents. Right? There’s a bunch of different product categories or products on there, but the the intent clusters kind of around all of them are very similar. It’s obviously had some up and downs, but in Ahrefs, there’s a ton of category data. There’s a ton of search data. And it was the most comprehensive kind of grouping a consistent data across a bunch of different companies and brands. So what I did was this is a little hack, but if you wanna find hidden site maps, you can go to Wayback Machine and just search for XML files. And so I found the XML file of the categories for g two and then threw those into a trefs. And then we found the best base term for every category. Right? So if it was CRM tools, the best base term might be best or like CRM software. Right? And then we also looked through that and we wanted to find all of the modifiers. And so we found a ton of modifiers in the data set and I’m assuming everybody knows what a modifier is here. It’s kind of like Nike shoes and free Nike shoes, where free is the modifier.

So from that, we produced a ton of different search terms and then built out a rig to essentially hit most major models from the last three years across Anthropic, Gemini, OpenAI, as well as Google search and AI Overviews. So we essentially hit every one of the models with the search terms. We pulled back the content, extracted brands out of the content, and then ran it through a little magic tool to get the domain for each of the brands. And then once we had the domain, we matched it up with a trust data and then another dataset called the company’s API. The final dataset was 1.9 thousand industries, 63 search modifiers, 1.9 million brand results, and 57,000 companies. So not a huge data set, but I think a significant one. So looking at the data, one thing I think is important to call out here is that when you go to ChatGPT, there’s a lot of stuff in between you and the actual base model. That can be the GPT has memory of prior things you’ve put in. It could be tools that it’s calling, analysis tools, or it can be like search retrieval. Right? So none of that is in here. What we’re doing here is looking at the base model. And one other note here is that for Google, we’re kind of treating Google like a model. Right? You put a phrase into into ChatGPT and it does a wall of text. We’re kind of saying the same thing. You put a phrase into Google and you get a bunch of text back. So we’re looking at the text and we’re looking at brand mentions within that text. Make sense? Yeah. Okay.

Alright. So here’s the first view. This is the top 30 domains mentioned in lM models and organic models by search volume. The things that are really interesting here for me are how much more SEO-y the organic results are with aggregated results like Gartner, G2, Captera, etc. And then obviously, how dominant Microsoft is in the B2B SaaS space. I had like no idea. Looking at the business type distribution of this data, obviously that’s a there’s a long long tail of B2B SaaS companies and a lot are probably held. Predominantly United States headquartered companies. And this struck me as odd that there are actually B2B companies that don’t have LinkedIn. Like what the hell are they doing? Anyway and then we can look at the distribution by revenue, which most of the results of companies were in that the smaller second smaller bucket.

Okay. So now we wanted to look at some comparisons since we had data for all the LLMs. This is probably one of my favorite charts here. This takes the top industries by search volume. And then we look at average brand mentions across Google AI Overviews as well as all of the individual models. I think this is cool. Like just to see we should all be thinking thinking Google. Like how impressive is that? And especially how impressive is it going from GPT 2, GPT 3.5 and the number of brands that it does per industry category to now. And just how impressive it is, how much visibility brands get out of GPT-I mean out of Google compared to the other models. This is another one that I think is really interesting because I think it really goes to show like of the smaller companies in this dataset, how critical search visibility in Google is to their even being found by users looking for their products and services. And then we can kinda drill into that bucket a little bit and see all the different companies in there, which I know a few of these. But obviously Google and LLM know a lot more. One thing that I think was kind of shocking for me was just how closely the results tracked for LLMs and traditional search. I expected a lot more variation in this based on company size or company, I I would say, brand equity. And they’re very very similar. We’ll see other mentions of that. Like this one. I I think this is was pretty shocking for me, but like if you’re using we obviously had a Ahrefs data. So if we’re using Ahrefs keyword count as kind of a proxy for content production and success. Right? Then I think there are a lot of parallels between Google results and language models. So I think maybe that gives us a little bit of solace that the things that we’re doing from an SEO perspective and writing good content and growing content and growing authority reflects well for Google results, but also reflects well for language model results. I also think that there’s a lot of opportunity. This essentially just puts the models against each other from a brand overlap perspective. Right? And we can see how similar, obviously, Google AI Overviews and Google are. But looking at all the different models, even between models in the same by the same provider, there’s a lot of opportunity to look at each of these models where they’re being used and to at least measure. If you can’t measure it, you can’t manage it. At least start measuring like where you’re being found, where you’re being talked about, what the different models are saying.

I threw this one in there. I just thought it was interesting. It’s just if you have like low traffic B2B sites, you have to pay for your visibility. And as you kind of get bigger, you don’t as much. So kind of a throwaway here, but interesting. This is the one that everybody at my work was like fawning over because look how how aligned domain rating from Ahrefs is from visibility and language models. Like you expect this from a AI overview and Google search search percentage, but almost every model follows the same curve there. And like I actually asked them to do it in a different way and they did. And it’s the same curve. Right? So the more authority you have in search engines, the more presence you have online, the more authority and brand authority you have overall, we see representations of that from a a language model perspective. Next is geography and time. So we had location data on all the companies and then we also know when the models were produced. So that led to some interesting outcomes like this one. Obviously, you can see I wish I had a pointer. On the 5 to 9, you see the disparity between traditional search and language models. That’s obviously the reason why we have RAG today is to fill in the knowledge gaps and the recency gaps with language models. That’s definitely there. We even see that to a little bit, the 0 to 4. But maybe more interestingly, how quickly the models are catching up and and covering that gap with just more frequent model releases and model updates. We definitely see that. So they’re definitely getting better at new information.

I’m good now. Where’d it go? Yeah. Also, I’d love to talk to somebody about this here. Obviously, most of our Google search results were taken from the US, but we weren’t like sticking an implicit or explicit US into our model results. But most of the results were from the US. So it would be interesting to see if there’s query modification or something like that. Even though we were hitting the API, they the models are heavily balanced towards US results. But it does seem like at some level they’re getting more worldly overall. This one is kinda hard to explain, so I’ll try. We took the average share of voice by industry across all states where there were companies in that industry in more than one state. So multiple states represented and which companies, which state which states had companies that had the strongest share of voice in that. And I think, obviously, this is B2B SaaS, so California is gonna be really heavily influenced in that. But maybe the and really interesting thing here is just like Washington, Texas, and New York. Not kind of felt like they made some ground in language models. Maybe not completely as black and white as organic is here. And then I had to throw in a social media slide, so that’s what this is. I don’t really feel there’s any impact with social media on on either of these. But I do think that companies that do marketing and are more found online tend to do social media, so that’s that.

The next section is we’ll go into a little bit of the modifiers. So this is what the modifiers look like. We segmented them and I I forgot to mention this, but once we share this, you can get the data and see all the modifiers and see all the search terms. And that’s linked from one of the earlier slides. But we have all the modifiers here, 63, and they’re all in nice little neat categories here. So we were able to go in and do some comparisons across different modifiers. So what did this modifier do? So looking at the overlap between some of the modifiers for LLMs and traditional search, we see again, here is an opportunity for us to be looking at LLMs and seeing how those results are different than search. We’re seeing a lot of like very high quality, highly qualified leads come in who have done a lot of research and procurement research in language models. And so making sure that your brand is even like in the discussion there for some of these qualifications, I think is very important. This is interesting because this is looking at the base search terms. So Nike shoes, what show what what brands are found there versus red Nike shoes. Right? Versus the modifiers. And we can see that affordable, free, some of these are very very close between language models and organic results. But a lot of them have some pretty large disparities between the two. So again, looking to see where those disparities are in the models. And like all it takes is going to the actual models and searching. I’m not sure at a level that you need to be spending $10,000 a month searching 40,000 search terms to figure this out. But the models are pretty easy and accessible for $20 a month to be able to figure some of this out.

This one is here because I really wanted the violin plot. Does everybody like violin plots? Okay. And then because I asked for a violin plot, they gave me two. So I think at least I think this shows an example that language models are at least getting more deliberate about producing brands and highlighting brands and talking about brands. So hopefully they get linked more. Here’s I think this reinforces the point here about other AI being different from Google. We can see how similar AI overviews are to Google results. And then just how dissimilar each of the other models are to Google results. Which again is opportunity to optimize those channels, which I think the two speakers after me are gonna go probably into a little more detail. I think this one is my favorite finding that I’m probably gonna make into a tool. The thing I like about this is kind of treating modifiers as fingerprints. Right? So if you think about it, we have like 63 modifiers. And for Google here, we can see that, you know, models think of their affordability is they’re really good for affordability. Right? We can see that models think they’re really good for for being free. Right? And essentially, how your brand is represented and peaks and valleys in the modifiers that are being used. I think that would be really cool to track over time to see how kinda your brand narrative and how it’s being reflected in language models and search results over time.

And then one other highlighting, just the amazingness of Google above all over other models in talking about brands and returning brands. And just their overall knowledge of brands, I think eclipses the other models. So some takeaways, and I think I’m running a little short. I would say just the diversity. I think that’s one interesting thing is that if you the same search term is going to return wildly different results in every model you give it. So if you’re doing any kind of rank tracking, it really matters what model you’re using and where you’re which model your prospective customers are using in terms of how you even get found. Right? So I think that’s super interesting. I think the other piece for me is the transferability. Like, just how much of the metrics of SEO that matter for us also matter from a language model perspective. And that if you’re doing the right things from an SEO perspective, we expect, at least from this data, to see that reflection in language models. Maybe you don’t have to scale out 10,000 pieces of content on other sites to try to influence models. You can just be really intentional and effective on your own messaging and branding and content.

I think trying to understand, especially for companies and B2B SaaS companies that maybe wanna get into the US market or our other markets, understanding that there are some biases in the language models with which types of companies they preference. And that’s, I think, at a national level as well as maybe even regional level and how those are different from search and and AI. And then I would say the other piece is just LLMs lag, query deserves freshness freshness and the immediacy of of of information retrieval results. And this is something that I think you need to be aware of, encounter as more and more people are doing a lot of their heavy research and language models. You may not even be there if you’re an early stage company, so you need to be found in other channels.

Last one is, you know, if we go away from ten blue links and it it really is just a heavily heavy presence of of language models. I don’t know if there’s a place for aggregator sites at that point because the language model is the aggregator at that point. Right? So that’s it. We put together a little interactive thing in Streamlit that you can go here and check out. And it has all of the categories, industries, and you can kinda pick your industry and see the data associated with it. And then also wanted to thank Patrick Stocks and Carrie Sugarman for going through and sanity checking my dataset and data. And I really appreciate it, and that’s all I got.