AI models can easily make recommendations for what products you should buy, but can they tell you what not to buy?
Many people search differently these days. Instead of a keyword, they’re asking full questions and having back-and-forth conversations with LLMs to find their answer. I’ve even found myself performing searches lately that ask the type of questions I would ask someone in person, like “what vitamins should I not take?” or “which running shoes are bad for flat feet?” (Shout out to my fellow flat-footed folks.)
With new personalization features, many LLMs will have knowledge of your preferences and shopping history and could, in theory, be able to accurately identify what products would not work for you. And with LLMs citing YouTube and Reddit forums much of the time, one would think they have a handle on what goods and services should be avoided based on reviews.
But as we know, LLMs don’t actually understand real human conversations. They just sound like they do because they predict what words to use. So, also in theory, asking them a question in a negative format could possibly confuse them or lead to some strange or inaccurate responses.
Why is this important? For ecommerce brands, it’s another example of how a strong omnimedia content strategy is vital to ensure your brand and/or product is well represented all over the internet – wherever LLMs could possibly cite. In many of these cases, the LLMs recommended brands that would be a good option even though I was only asking for what to avoid, so there are opportunities here.
And the way the models pivot from “don’t buy this” to “tell me your zip code/budget/vibe so I can find you the perfect fit” shows that the future of search is in the conversation that happens after the initial query (and the potential bias).
Let’s test the above theories by asking some LLMs a few negative questions and see how well they answer them.
Asking LLMs a Negative Search Question
For this experiment, I will be asking AI Mode, Copilot, and ChatGPT these questions:
- Which oil delivery companies should I avoid?
- Which coats are not good for New England winters?
- What tropical resorts should I not visit?
I’ll be looking at how well the models answer the question, what they cite, and if their answers are accurate.
Confirmation Bias: Oil Delivery Companies to Avoid
My first query was “which oil delivery companies should I avoid?” This question should result in answers based on reviews by users or posts online about bad customer experiences, but for some answers, I was also given recommendations for companies I should use or tips for searching for an oil company even though I didn’t ask for any of that. So in this case, I also asked each LLM which company I should use just to compare the answers and found myself not trusting the unbiased query responses.
Here are some of the tl;dr results:
Negative Query | Unbiased Query | |
AI Mode |
|
|
Copilot |
|
|
ChatGPT |
|
|
AI Mode:
Google knows I live in Connecticut, so when I asked AI Mode the question, it focused on the northeastern states of New York, Connecticut, and Rhode Island. And it didn’t shy away from identifying specific companies. It named names:
AI Mode also provided a quick list of red flags to avoid when choosing an oil delivery company, and cited mainly Facebook and Reddit posts in its answer. Under most circumstances, I think this would be a bad thing, but in this case I’d rather hear real perspectives from people who have used the companies before.
At the end, it asked me to provide my zip code so it could help me find reviews and prices for a local provider, which is a great follow-up.
For the sake of this experiment, I asked it the unbiased query of “which oil delivery company should I use?” and received a list of recommended companies as well as a couple of comparison tables showing price and service differences. This time, AI Mode cited mainly the companies’ own websites instead of user-generated content, so I feel this is less trustworthy.
Copilot:
I realized that I’ve never actually used Copilot before so I thought I’d give it a try for this experiment. So far, I’m not impressed. It didn’t name any actual companies to avoid. What it did was list the major warning signs of oil companies to avoid:
And at the end, it offered to help me “evaluate specific companies in my area and check for red flags” (though it got my town wrong). All of this might be helpful for some initial searches for reputable companies to use, but I was looking for actual names of companies to avoid. Good for Copilot for not being a snitch though, I guess.
It did name names when I asked it the unbiased query of “which oil delivery company should I use?” This was much more helpful, however, it also cited the companies’ websites, which isn’t very trustworthy. It gave a comparison chart and a more personalized list of who to go with depending on what you’re looking for: lowest price, full-service providers, or cleaner fuel (again, probably based on the companies’ own statements so probably not that reliable).
ChatGPT:
These results were interesting because there was a bit of everything. ChatGPT named two local companies I should avoid, included a list of red flags to avoid in other companies, advice to “avoid getting burned,” and even suggested a few companies that “generally look safer for comparison.” That last part was interesting because I’d like to know how they determined that those three companies were safer but there were no sources cited.
At the end, it asked if I wanted a list of reliable local oil companies and ones to avoid in my specific area (didn’t I already ask that?) so I prompted yes to see what it said. It offered three local services (though two of them are the same company), repeated the two I should avoid that it mentioned prior, and again repeated the list of red flags. But again, no sources were cited so I’m not sure if any of this information is even correct.
I also noticed that it was very careful with its wording. It says, “Companies to research carefully before using (not automatic “avoid”)” before listing the two bad companies. Perhaps trying to avoid more lawsuits?
For what it’s worth, I Googled the “better-reputation” oil companies it suggested and saw that both of them had 4 stars on Google. However, one company only had 10 reviews, two of which were 1-star negative reviews, and the other company had 14 reviews, two of which were also 1-star negative reviews. Not exactly glowing, but enough for ChatGPT to recommend, I guess.
I then asked ChatGPT “which oil delivery company should I use?” to see if an unbiased query gave a different answer. The results were:
- It gave two top recommended providers along with a list of other providers in my area, citing all of the company websites as its sources.
- One of the results was a company it initially told me to avoid (Standard Oil), so that wasn’t helpful.
- Only one of the companies was named in my initial search as being one that appears “safer” (Best Oil and Propane).
- None of the companies in this search matched the ones provided in my follow-up query when it offered to show me “better reputation” local companies.
- It also gave me more of the same tips to consider when choosing a company.
Overall, I wasn’t thrilled with its all-over-the-place and inconsistent responses.
Social Proof: Bad Coats for New England Winters
Depending on the year, New England winters can range from 8 degrees with multiple feet of snow (like this year) to 48 degrees and muddy, so I understand this is a tough question for anyone to answer, human or robot.
Nevertheless, I asked the LLMs “which coats are not good for New England winters?”
I think reviews are the best way to find the best clothing options, so I expected to see many here. However, we need to keep in mind that a review on Reddit vs. a verified review on a more official site.
AI Mode:
Reddit and Business Insider dominated the citations in AI Mode’s answer. The answer highlighted five different types of coats to avoid for “deep winter,” as well as a list of “critical design flaws” to help someone identify a bad winter coat.
Using a Reddit citation, AI Mode recommended L.L.Bean, The North Face, and Patagonia as the brands to trust.
At the end, AI Mode asked for specifics on the coat I was looking for, like if I needed it for daily commuting in Boston or for activities like hiking and skiing. Another good follow-up question for an overall helpful answer.
Copilot:
Along with a numbered list of types of coats that don’t work for New England winters, Copilot also provided a comparison table to see the answer at a glance, and a quick bullet list of what types of coats work best. It recommended the brands L.L.Bean, Canada Goose, and The North Face.
Copilot cited Reddit and a website I had never heard of called shunvogue.com.
At the end, Copilot said it could help me pick a coat based on my budget, style, and “how easily I get cold,” which I thought was cute.
ChatGPT:
Similarly, ChatGPT offered bulleted lists of the types of coats to avoid and what coats work best for New England winters. However, there were no citations and it did not mention any specific brand names.
Again, much like Copilot, at the end it asked me for my style, budget, and whether I “run cold or warm” so it could recommend specific coat styles and brands that would work. This is another good follow-up question to help provide a personalized response.
Formatting: Tropical Resorts to Avoid
The query “What tropical resorts should I not visit?” provided a wide range of results focusing on everything from environmental concerns and safety to cleanliness and seaweed problems. Some of the answers got really specific, too (musty-smelling rooms, anyone?).
But does the way information is presented influence the way that you consume the LLM results?
AI Mode’s answer seemed the most trustworthy and well laid out, with a clean list of resorts and highlights from verified Google reviews. Copilot’s brief, clumsy list that focused on countries instead of resorts wasn’t an interesting or helpful read, and ChatGPT droned on about everything but what resorts to avoid.
AI Mode:
Using information from Google Reviews and citing Tripadvisor, Travel and Leisure, Business Insider, and the U.S. Department of State, AI Mode provided a very specific list of resorts “with significant negative feedback” and included the main complaints against them. Again, AI Mode held nothing back.
It also broke down lists of destinations with safety warnings and environmental and health concerns. Who knew sargassum seaweed was such an issue?
At the end, it asked if I was looking for a specific tropical region or if I wanted highly-rated alternatives to these, which seems like a natural next step in the travel-planning process.
Copilot:
Once again, Copilot refused to snitch and didn’t list any actual resorts. It did list tropical destinations I may want to avoid right now, but without any citations, reviews or any current research to back it up, the list just felt mostly kind of racist (aside from the mentions of hurricane season and our friend sargassum seaweed).
At the end, Copilot offered a list of how to decide whether to avoid a resort and offered to provide me a personalized list of places to avoid based on the timing, travel style, and region I prefer, and suggest alternatives that “match my vibe.”
ChatGPT:
ChatGPT also focused on locations rather than listing the names of actual resorts to avoid, but its lists were more varied and descriptive. It focused on safety risk, scam risk, environmental and ethical concerns, natural disaster risk, and locations that are often overcrowded or “degraded.”
The list of locations was more diverse than the others. Imagine including Bali in a list of places to avoid, but ChatGPT says that due to traffic, pollution, overtourism, and beach litter, you should avoid it. But with no sources cited or links to real reviews, it’s hard to know how accurate any of this is.
It ended by offering a quick list of reasons to avoid resorts then asked me to provide my budget, “preferred vibe”, flight time tolerance, and safety level of choice so it could recommend some resorts.
AI As Your Personal Shopper
Testing these negative queries proved that while AI is good at telling you what to buy, it’s still a bit of a mixed bag when it’s time to be a critic. AI Mode consistently leaned into the data, naming names and pulling from the raw, unfiltered world of Reddit and Facebook to give the kind of honest (if sometimes harsh) feedback we actually look for.
On the flip side, Copilot felt like a cautious corporate assistant, providing helpful generalities while carefully avoiding any potential snitching. ChatGPT sat somewhere in the middle being authoritative and descriptive, but often leaving me wondering exactly where it was getting its information.
I liked how each LLM had its own way of personalizing the experience with a highly specific follow-up question and varied lists of recommendations and tips, but I think AI Mode was the clear winner here.
For those of us on the consumer side, this is a reminder that AI still requires a healthy dose of skepticism. If an LLM tells you to avoid a specific oil company or a tropical resort without citing a single source, it’s basically just digital gossip. And the inconsistent results make it hard to believe anything they write. However, the personalization features are where these models start to shine.
If you’re in ecommerce, you can’t afford to ignore reviews, forum rants, and social media venting sessions that can decide if you’re a red flag or a safe bet. A solid content strategy now means ensuring that when someone asks an LLM what to avoid, your brand doesn’t just stay off the bad list but shows up as the recommended alternative.