From Semantic to Predictive Search: Evolution or Involution? | Dr. Ricardo Baeza-Yates

From Semantic to Predictive Search:

Evolution or Involution?

By Dr. Ricardo Baeza-Yates
Chief Scientist at Theodora AI

Dr. Ricardo Baeza-Yates walked us through the evolution of search from lexical to semantic to the prediction-powered engines common today. He explained the technical mechanics of vectors, neural IR, and RAG. He also looked at bigger philosophical AI questions: What are we losing when we let machines “guess” for us? Are we widening the digital divide? And are we getting a little too lazy?

Is FOMO hitting you hard after Missing SEO Week 2025? It's not too late to attend in 2026.

SEO Week 2025 set the bar with four themed days, top-tier speakers, and an unforgettable experience. For 2026, expect even more: more amazing after parties, more activations like AI photo booths, barista-crafted coffee, relaxing massages, and of course, the industry’s best speakers. Don’t miss out. Spots fill fast.

ABOUT Dr. Ricardo Baeza-Yates

Ricardo Baeza-Yates has been Director of Research at the Institute for Experiential AI of Northeastern University (2021-25); CTO of NTENT (2016-20) and VP of Research at Yahoo Labs, based in Barcelona, Spain, and later in Sunnyvale, California (2006-16). He obtained a Ph.D. in CS from the University of Waterloo, in 1989 and is co-author of the best-seller Modern Information Retrieval.

OVERVIEW

In this fast-paced, thought-provoking talk, Dr. Ricardo Baeza-Yates (renowned computer scientist and Chief Scientist at Theodora AI) guided us through the evolution from classic semantic search to today’s neural information retrieval and predictive models. He began by revisiting the fundamentals of semantic search, rooted in ontologies, entity dictionaries, and linguistic rules, and contrasted it with modern vector-based retrieval powered by large language models.

Ricardo talks about the technical mechanics of neural IR and retrieval-augmented generation (RAG), but more importantly, he raised crucial ethical, social, and cognitive implications of AI-powered search. From language inequality and cultural bias to the dangers of overtrust in predictive systems, Ricardo challenged us to rethink what we mean by “understanding” in machines, and whether our increasing reliance on AI is diminishing our own skills, judgment, and critical thinking.

Recommended Resources:

Modern Information Retrieval by Ricardo Baeza-Yates and Berthier Ribeiro-Neto
Language models fail to say what they mean or mean what they say – Venture Beat – Ricardo Baeza-Yates
The False Promise of AI Democratization – LinkedIn – Ricardo Baeza-Yates
An Introduction to Responsible AI – Lecture – Ricardo Baeza-Yates

DOWNLOAD THE DECK

Talk
Highlights

From Logic to Guesswork, How Search Has Evolved: Ricardo unpacked the shift from classic semantic search, built on structured ontologies, lexicons, and linguistic rules, to today’s neural information retrieval systems that rely on vector embeddings and machine-learned associations. The move has made search faster and more flexible, but it’s also replaced true understanding with statistical guesswork.

The Double-Edged Sword of Predictive Search: Predictive systems promise quick, tailored answers, but they also risk losing nuance, misreading intent, and prioritizing fluency over accuracy. Ricardo talked about how these systems can amplify bias, spread misinformation, and over-personalize results to the point where users stop thinking critically.

Why Human Context Still Matters: Despite all the AI hype, Ricardo made the case that machines still don’t “get” things like we do. Common sense, context, and reasoning are uniquely human strengths, and the more we delegate them to language models, the more we risk losing our edge (and our judgment).

Presentation Snackable

Is FOMO hitting you hard after Missing SEO Week 2025? It's not too late to attend in 2026.

What’s one thing you didn’t get to share in your talk that you’d add now?

I was in a unique position as the host of the event where I had to go deep, but also leave space for other presenters. I would have liked to have spent some time discussing measurement for the various AI surfaces. That’s an unsolved problem that a lot of people are spinning their wheels on, but we have some great ideas and metric packs that companies should be considering.

Has anything since SEO Week changed how you’d frame your talk on AI Mode or SEO today?

One of the most important developments is how recent rulings in open copyright cases such as those involving Anthropic and Meta have upheld the idea that using copyrighted content can qualify as fair use.

Transcript

Mike King: Ricardo Baeza-Yates. Dr. Ricardo. Let me tell you a little bit about him.

He is a technical adviser on all things AI, search technology and data science. Some fun facts, he’s also a geography enthusiast aiming to visit a hundred countries with over ninety already visited. He’s lived in Chile, Canada, Catalonia, and California.

He’s appeared on a quiz show at the age of 17, similar to Who Wants to be a Millionaire. Presenting from semantic to predictive search, evolution or involution, please welcome our first speaker, Dr. Ricardo Baeza-Yates. Thank you.

Dr. Ricardo Baeza-Yates: So thank you. First, I would like to thank Michael for inviting me here. We met in Boston in 2019. And the first time I spoke at a CEO conference, this is my second time.

So I’m a research scientist. I hope you enjoy what I want to tell you. I’m in a transition from one research position to another. But in the meantime, I’m also the Chief Scientist at Theodora AI, a startup where I found.

So this agenda will go very fast. So this will be like the web, information overload. So I hope you understand my accent and my speed. And this is the agenda. You can see it, but let’s go right away. So when I met Michael, it was a intent, a software that was doing semantic search. So the first thing I would like to summarize is classic semantic search to put it in contrast to what is happening today.

So you need to start with knowledge resources. So basically ontology. All of you know what is ontology. Right? I’m sure of that. A lexicon, entities dictionary, and some linguistic rules on all the resources.

These are the relations too. So some of them are language dependent, like the lexicon, and some of them are language independent, for example, like the ontology. That should work for any language.

Now here, I would like to do a small detour in what I think is a very important problem. Language inequality. There are more than 7,000 languages already still alive. We have killed a lot of them, but still more than 7,000 alive. About 300 have a Wikipedia. About 200 have language models. I think Lima is one of the one that supports more languages. That doesn’t mean that they support it well. It’s just that they say they support it.

But if you do the calculations, 10% of the people don’t speak any any of these 200 languages. 35% of the world don’t have internet, but they have something we don’t have, privacy.

Right? 25% of the population is too old, too too too small to use this and another percentage is too old to use this because they are not digital natives. And who knows all the other gaps? Education, economy, technology. So if you do these numbers right, of course, they overlap. You will find that more than half of the population of Earth don’t have access to generative AI.

So please don’t talk about democratizing AI. What we are doing is increasing the digital gap in a very, very fast way, which worries me because that means that there’s a bigger gap between developed and undeveloped countries.

So lexicon, I will go fast here. I’m sure you know this. So all languages, the same concept. You can use one of the languages, English or Esperanto, whatever you want for this. Entity dictionary, this, in many cases, is language independent, but in many cases, language dependent. For example, here you have the example of Geneva. Five different ways to say Geneva in five different languages.

And then we have linguistic rules. Typically, you use some kind of markup language to express these things. You have case frames, For example, something about food. I will give examples of this later. Or, for example, templates to some type of questions. Or, for example, you have another template for different kinds of verbs and so on.

And then you have the semantic engine. So you start with the semantic resources and then you need to build this kind of machine that tries to understand our world. It’s always trying to be like a mimic. It will never understand the way we do. The best examples are adversarial AI where you change one pixel of a cat and the cat becomes a dog with high confidence. And then you change one other pixel from the dog and becomes a cat with very high confidence. So what they learn, we don’t know really, but they don’t learn the same as us.

So these are the semantic engine functions. To talk about all of this, we’ll need, like, two hours, and you still will say we need more details. So just a quick list so you know what are the parts for language detection. Today, you will use machine learning, boilerplate detection and removal. I will give you an example in one minute.

Everything about tokens and lemmas. And then you get to natural language processing, so part of speech tagging, morphological analysis, very important entity entity destruction. You can learn from the documents about entities and increase your entity dictionary.

And then you need to disambiguate things that maybe have more than one meaning. For example, whenever, LLM will know what I mean when I say Chile doesn’t mean Chile in Chile, I think I would start thinking that they’re smart. Before, no way.

And then you do the classical things of, information retrieval. So first, you retrieve the documents, then you rank them, and maybe you you want to classify them depending on the task.

So how do you do this? You can do use classical NLP. Let’s say before 2010, lot of things. But now today, everything, almost everything is done with a statistical NLP, which is basically machine learning applied to language processing. This will be a specific machine learning models. So and train for that specific task. I’m talking about supervised machine learning.

But today, many people prefer to use good language models like ChatGPT, Gemini, Claude, Grok, and so on. But these are generic models. So they are not trained for that task. So they will not be perfect. But they are quite good. They are very competitive, and that’s amazing.

Okay. So this will be an example. You have a page from USA Today, and this will be the boilerplate. So you want to delete everything that is not relevant to that page, and we only keep the news article. And here, for example, in the news article, I want to extract, for example, entities. For example, there you have Golden State Warriors. I live in California. Sorry.

Game one. This is hard. How to understand what game one means? Right? We understand quickly the systems are much harder. And Western Conference finals.

Entity instructions. Basically, you have a tested entities, entities that you already know and you have in your dictionary, but also you have entities that you will find that are new.

Also, you cannot store all entities. Most of them you will never use. It’s like to store all names of towns in the world.

I love geography, so I know a lot of them. But still, I don’t know all of them and will be silly to to learn all of them. So remind a reminder, entities are people, institutions, places, dates, and so on.

And then we need to identify things that are important depending on the task in the document. For example, the topic. For news, really important. Some aspect around the topic. For example, it’s about tourism. The genre. For example, a biography, a contract, consumer guides, and so on.

Sorry that I don’t move. I’m not trained on using these kind of stages.

And then we need to very important, we need to identify things like spam, bias, hate, any toxic content, and you need to do it well. In my responsible AI talks, I mentioned this silly case where a Facebook engineer used an English trained toxic content classifier that was used in France. So the town of Vicce was forbidden. They had to wait three weeks until someone realized that that was not exactly the same word. It was an e at the end.

Reduction quality. That was ten years ago. Now if you use united content, reduction quality is really, really good. And then other things, like you have links as part of a shopping cart and so on. These are things that you can do. But is this enough?

For example, they say, what about lunch in Times Square at noon? For typical information retrieval, we say, okay. What really matters are everything less, all these connectives. So we have lunch in Times Square at noon. Well, I was in Times Square before dawn for lunch.

The same bag of words. Or don’t go south of Times Square afternoon for lunch. If we delete south, the same bag of words. So it’s not enough.

But that’s the trick more than 90% of the time. This is what is amazing about classic information retrieval that solves most of the problems.

But then you can get to the semantic part. Let’s do some syntactic analysis and some semantic representation. For example, I bought a watch at the store on Monday. So we have one entity, Ricardo, another entity, the store, another entity, the watch in some sense, and then we can do this analysis. So the type of event is, someone bought something.

The subject is the person. Then you have a physical object. A location, the store. We don’t know where the store is, so that is hard. But if the person is using a GPS and sharing that, we know what it is. And finally, a date.

And we can represent this for many ways, and we will have a normalized way to represent it. And then, for example, if I say lunch in Time Square at noon, I will represent it as this using this idea. So this is a template, a structured template to represent this action. But I don’t care about all the other variants because all of them will be represented the same way.

Okay? So this is the important part, how you do this well and you can capture all possible ways to say something.

So you’re reduced to a single language independent formal interpretation, and this is the object you need to manage.

So one application to finish this first part is expert answers. Basically, everything that’s not in the ten blue links is kind of an expert answer. For example, something coming from the knowledge graph or something coming because you recognize the entity and so on. So for example, you do formal interpretation and then you get an expert on that formal interpretation.

Every expert will understand if it has to trigger or not. So basically, you do it in parallel and some experts say, well, I don’t have anything to do with this interpretation so nothing today to say. But in some cases, you may have to do it. For example, here are some examples.

You can have gel for local business listing. You can have a weather API. You can have, stats[.]com for for sports. We were using this at intent at that time, let’s say, five years ago, seven years ago.

And these intentions are dynamic. This is from a real search engine that we had. So you see that blue is business intentions. The second line, the darker second line is images.

And the line in the bottom is weather. So it’s interesting. People are always interested in weather and quite a lot. So, like, 5% of the queries is what is the weather instead of going outside and looking. I never understand this.

And then you get to the user interface. And here, I try to capture everything in one slide. And you will see why I think I don’t have a pointer here.

You will see why you need to have more than one expert. So, so the query there is bond, very ambiguous query. And you can do query after completion. And for example, you have in the right the possible query after completions.

Then you do a spelling correction. Here, in this case, there’s nothing to do. Bond is an English word that could be a surname, could be many things. And then you do semantic analysis.

You get some query suggestions. So this is different from query auto completion. This is after the query. And then you do the most interesting part, intention analysis. What the user was trying to do?

And for example, here in this example, say, okay. Machine learning, you guess that 70% of the time the person is looking for information but 30% of the time the person is looking for doing a transaction. And you see there that the information is split in movie, definition, music, and the transaction is split in financial, movie, and music.

And then I send this interpretation to the experts. Okay. I think 70%, 30% transaction, trigger something if you think you need to trigger something. Well and this is what I get.

The different triggering things and these are ranked. And the first one is movies. Maybe the person is trying to see a James Bond movie near you. So you check for movies nearby. And then maybe the person wants to know what are the most famous movies from James Bond. And this is the list currently. There’s top three movies from James Bond.

And then the second is news, for example, related to Bond Street in London. It’s financial news. Bond Street is like Wall Street for the people who don’t know. And then you have music, the best of Bond.

There is also Graham Bond, which is a famous commentator at BBC. So it could be any of these, but you have the ten blue links for that expert. And then finally, you can say, okay, definition. And you have bond finance, a chemical bond, bond paper, and Bond Street.

So this will be like a top semantic search engine if you can get all these answers. Blue is the explanation. Red are the examples.

So the question is what to learn here. You can learn everything. Maybe you can learn everything about 95% of the queries. That’s good enough for most people.

So let’s go to the second part, neural IR. Everything after, let’s say, two thousand fifteen where it became popular, so last ten years. I call it pseudo semantic search because it’s a good guess. It’s not really knowing something.

So the basic idea I think you had this already yesterday, so I will not give too much detail. Only wait to say a few points. But basically, first, you learn a representation basic based on the context of those words. For every word, you need to check the context. Depending on the model you use, you use only what is next or what is before or both like BERT.

And to extend the context, how many tokens, 1,000, 2,000, infinite.

At some point, it doesn’t make sense, but it’s not clear how much you need to use.

This can be extended to sentences and other objects. So there will be multimodal. So which images are near, which videos are near, everything, any object is part of the model. And this is already being used by many companies.

Okay. So you have a representation and now you need to index that representation.

So instead of the classic inverted index that you may know, basically, you have all the words and you know in which places those words appear, now you need to use a vector database because you need to compare two vectors.

These vectors may be a combination of vectors. For example, you have a sentence, you need to combine these vectors. This is not an easy problem, and there are many ways to do it. Some people average all the vectors. Some people do other more complex operations to extract the fewest semantics that is captured by the context. That’s why I say fewest semantics because it’s captured by the context.

And after I do that, we need to decide which similarity function we use. The classical one coming also from IR from the vector model is the cosine distance. Because if two vectors are aligned, the cosine is one, so they are the same. So it doesn’t matter the numbers, but it matters the direction. And if they’re orthogonal, it’s zero. So there’s nothing related. That means that in any common dimension, one of them is zero.

And then we need to use approximate neighbor search algorithms, a and n, to search. So basically, who is the closest on this set of vectors? This is harder than inverted index. It takes more time.

It’s not as fast. But now there are very fast techniques to do this, in a good way. For example, this is what is used for rack. I will talk a little bit about rack, but I think yesterday you had a lot of rack. Right? Right? So I don’t need to touch it again because I have other things that I think may would be more interesting for you.

So this will be the classical, old engine transformed to a neural IR engine. So you use old IR. If you read my book on IR, you retrieve a good set of candidates, maybe using TF IDF or any BM 25, any good vector model. And then you have a very good list where you apply the ranking based on neural IR.

Why? Because the neural IR ranking is much slower than traditional IR. So you cannot apply to all the possible candidates. You need to just to do it in the best candidates. So typically, for example, 100, 200 people will only look at the top so you don’t care if don’t consider everything. And, of course, here, it’s very important to have this learned query document representation on these vectors that apply to queries and documents.

This is from a chapter of my last book that was published this year, so we’ll you will see it at the end. I published, like, a collection of chapters to update my book of 2011.

Another representation, a bit different, is what is called a dense retrieval system. And here, basically, we have directly a neural ranker. No candidate list. This will be slower, but, of course, you will not miss any pseudo relevant result. So this is better in the sense that you will get better results than the previous standard combination of classical IR and neural IR.

I will skip this. And then we get to RAC. RAC really is, for me, is a bad patch for the problems of language models, but I will put it here because it’s part of neural IR.

So, basically, you have a lot of things that you can do, pre retrieval indexing and so on. Then you do retrieval, then you do post retrieval, re ranking, filtering. And then after that, you do the generation generative part. So this will be the typical, in the same as before from the same person. It will be a typical architecture.

For the first part, it’s like a dual IR system. You get the results, and you see that the results will be part of the top k documents. So you see that you get the query plus embedding in the prompt all the top documents.

So basically, you have the answer there. And for me, the generative part is just the user interface part that basically extracts the answer from the results that you already have. So, basically, it’s a kind of laziness. Instead of you doing that, let the system do that.

That has a problem. It still may be wrong because it’s generating that. It’s guessing the answer. It’s not really taking this is the correct parts of the documents, and this is generating based on whatever model you are using. And remember, these models are generic. They are not tailored for a specific task.

And there are many rack modules. There’s a very good survey of last year from one and one from York University. And here, you have all the models that already exist. And this is a good list of things that you want to improve any of these stages.

There you go. Okay. But this is the part that I’m really interested in, what I see today. What are the dangers of using language models to search? And I think you should be very interested in that. Right? Okay. So you know this. Right?

So first question, is the is the parrot smart or not? I still don’t know. Hard to say. Right? It’s imitating a person so well that you may think, okay, but but maybe the parrot is really knowing what he’s saying. Okay? It’s a good joke. I don’t know the the the answer. For me, the parrot doesn’t understand anything. But if the parrot says good morning every morning and not at night, I think it’s understanding something. Right? And parrots maybe are smarter than language models.

But for me, this is like a kind of a Pandora box. I hope you all know some Greek mythology. And, you know, the last thing was in the box was hope. I don’t know if it’s still there. For many reasons, I want hope in the world today. But I will not get there. I will skip that part.

But, wow, how much procrastination we have today. And not it’s not AI’s fault. Well, partly it’s AI’s fault. AI is giving you all the recommendations. But basically, it’s our fault. Laziness, losing time, it’s good. But if we really delegate many tasks to machines, this really will be laziness, pure laziness. And laziness produces mediocrity.

Yeah. Maybe it can be done 90% as good as you, but not 100%. And there’s a difference. It’s in the top. It’s in the last 1%.

A chimpanzee shares 80–sorry. 98.9% of the DNA of us. So that 1.1% makes a difference. So don’t tell me that small difference are small. No. Some difference are small difference are huge. And we are very different from chimpanzees. Although we come from them, that’s why we have many wars. Right? I wish we come from bonobos, but that didn’t happen.

Chatbots. So they are fluent. They are confident too much. They look like arrogant sometimes. But they are very naive. They believe everything I say. So I can lie to them and they will say, I apologize. I made a mistake. They’re incoherent because sometimes the word the sentence doesn’t really make sense, but they believe that makes sense. And if they really hallucinate, they are dragged.

Right? This was such a good marketing word to use for something human that these things cannot do, but they still say they do. But there are the problems. Bias, hate, lot of bias. So that is what we are doing with Theodora. We are finding bias first in Spanish.

More than 90% of the bias in documents is gender bias, sadly. And we mitigate the bias. So we can say, for example, women cry and men don’t. We say all people may cry, but they express their emotions in different ways. This is the mitigation. So say it in a way that doesn’t matter to gender.

Censorship. Have you thought about all things that you don’t see in the answers? How much, let’s say, censorship is in gun control? I will leave it there.

Lack of diversity. We are mainly using the web. The web is not really a good source of knowledge. It’s based to some countries, particularly the US. A specific culture. And if we have a specific culture, we have something even more dangerous, cultural colonization.

So how how many continents there are? Continents. Continents. Really? Oh, continents. Continents. Yes. Is that is that bad, my English? Seven. Seven. Well, and then why the Olympic flag has only five circles?

Well, at the beginning of the twentieth century, someone in the US decided that America will be two continents because you you don’t want to be with Latin Americans. So look, they they could remove Mexico from North America because that would not make sense. So now there are two continents. But if you go to Europe, I was taught there was only six. And why there’s missing one in the Olympic flag? Because Antarctica never competes.

So history lesson. And lots of skills. So how many people can find a place in New York without using a GPS? One. I will test that. I will test that. Two. I will send you very far. I will see if you can find it.

500,000 years ago, men could go anywhere and come back at night finding the place where they started without using anything, maybe only the stars. And it was cloudy but black. They had to come back. So they used other techniques because sometimes it’s cloudy, especially in the northern hemisphere.

So if in five years you don’t know how to write a letter by hand, that was a meme, the problem is not that you don’t know how to write a letter by hand. The problem is that you forgot how to think to write a letter. And writing a letter is the most complicated thinking that we do apart from solving professional problems. How to communicate something to a person that’s not there in front of you where you cannot use your face and your body.

So beware. You are losing skills. We all are losing skills. And we get to this information. Sadly, we know that this cannot be true any longer, but this was believable two years ago. Or this one, we know that it never happened. I would skip. No pictures. Just in case. So we ended the issues. Remember in Arab Spring, we could watch videos of what was really happening. Today, we can’t.

They can do whatever they want. This is the best one. One year ago in Hong Kong, a video conference. After the video conference, a person transferred $25 million to another account. Everyone in the video conference was fake.

The CFO that told him to transfer the $25 million and all the other people. Right face, right voice, right movement. Maybe there was something wrong, but the person said, oh, maybe it’s an Internet issue. Right? So beware. If you see an Internet issue, maybe it’s a fake content. It’s not an Internet issue.

And then very important for you, there are two big cases that now are only one, New York Times and the Daily News Group. They were, put together and still going on. I hope this year something happening. But also you have Getty Images, against the stability. There’s another one of, Canadian newspapers, also against OpenAI and Microsoft. Many, many things that will decide if it’s fair use or copyright infringement. Very complicated.

However, in the, for example, New York Times, it’s more than that. Sometimes you get the article verbatim because you have more than one trillion parameters. So you have some memory if there’s only article that was written in that sequence of words. So you get exactly almost the article, maybe minus three, four words. And that is not fair use. That is plagiarism. So this is complicated.

And then the last problem I really worried, mental health. So I don’t know if you know this. This was the first case. Last year, there was a case of a teenager in in Florida, but this was different because the teenager told the person sorry. Not the person. I’m doing the mistake. Told the chatbot that was, committing suicide last night. Sorry. The next the next night.

And, of course, the chatbot didn’t do anything because it’s a chatbot that doesn’t have any autonomy and any agency. But this one is more worrisome because this PhD student, so a smart person, smart guy but with a mental issue, killed himself. This was the last conversation. I want you to read it.

The red is the chatbot. The black is the person.

You can decide if the chatbot help or not on committing the suicide. But clearly, at the end, it’s the understanding that they will meet in the other life, which is clearly impossible. Also, it’s very hard to get hacks from chatbots, but this person was not in their right mind.

This was found by his wife after the event. And they had two kids. So this destroyed the family. But this seems to be personal.

Like in February ‘23, someone in Brazil asked I hope you understand some Portuguese, but asked, doesn’t matter. Some very famous computer scientists that have died. Well, Well, I appear in second place. I said, okay, this is getting personal. It says that I’m not Brazilian, but I have worked so long in Brazil that it’s like had to be in this list. But I had never worked at in the Sao Paulo University. I think I was there once. But who knows? The truth, I’m a robot. But nobody believes me.

But, of course, I check right away what this ChatGPT thinks. Sorry. No. It doesn’t think. What ChatGPT generates about me? So so many things that were not true between February ‘23. So many things that were seen and then how he died. Not officially announced but some reports. Okay.

Give me the reports. And it’s a very good answer because, I I hope not too soon. Maybe these two associations will report on my desk. But I hope it’s not too soon.

But there was a Ricardo Baeza that died in 2021. So maybe that creates a confusion because the system doesn’t know that or hormones in people. But interesting, in Portuguese, it knew exactly how I died, leukemia. And this this is another important. I died at 55 years old. Doesn’t know any mass. If I was born in ‘61, how I can be 55 in 2021? Right? They don’t know mass either.

So in part 23, when this, tragedy pt four started, I said, okay. Let’s see what what is there. Okay. There are more lives. I even older than I am, Making things that are not true. So all these things are not true. And why I show this? Because how many people can check can fact check this? How many people in the world can fact check this? One. Me. No one knows this. Not even my family. So very hard. And very good prediction.

I won this award last year, but the system said that was in 208. Okay. This was a good prediction, but still keep doing false data. More false facts. More than before, I’m okay. I’m more famous. And I’m alive again, so I don’t care.

So I’m not a robot. So to finish this part, please don’t dehumanize technology. This is a big mistake we’re doing. Even I used twice the wrong verb. We need to have a special verb for these machines. So what is wrong with this picture? It’s imitating that it’s sinking, but it’s not sinking at all. So please don’t use this kind of picture because they don’t sink.

In fact, they don’t read, write, or see. Why? Because to read, write, or see, you need to understand, and they don’t understand anything. They don’t reason. They do amazing cases. So very powerful limitation, but they don’t reason at all. So I’m I really hate my own colleagues that use reasoning for all these things that they can do, in in language models.

They don’t think. They imitate conversation. Remember Eliza, 1966. Just imitating the conversation. Did very well. People thought that was a very good, stool. They don’t lie because they don’t know what is true or false. To lie, you need to to know what is true or false. In Spanish, we have a very good verb for these kind of people that do this, especially in South America.

Here, I guess some people use BS, but it’s not nice to use this word. But, some some people in newspapers are using BS. I I would like to have another verb in English that says, okay, this is a person that doesn’t know everything that he says, but 99% is right because it’s so good at inventing things that does very well. Of course, they don’t have intentions, but they produce harm. You can check my, responsibility. I talk in the web.

The worst case was in Australia. Maybe not too much AI. Almost half a million people were discriminated, and they suffered economically. And after four years, the government had to say, okay. Let’s stop this. And then they had to pay more than $1 billion in compensation. And they were trying to save money with this system. You see? You try to save money, you did wrong, and now you have $1 billion less for the government.

Very sad. And, of course, I already said, they don’t hallucinate. They are not humans. They don’t know the world. They just make errors, but they don’t want to tell us that they make errors. Some errors are creative. Yes. And then you can use them. But in your context, you don’t want this kind of creativity.

So this is the to end the third part. And I will go very fast to the last part. Be lucid.

Powerful imitations create powerful illusions. We are living an illusion. And many people are living it a way that, especially CEOs, that I find it amazing. Like, they say that it’s smart, but they are saying that AGI will come. If AGI comes, we deserve it. We are so stupid. Yeah. Like, it’s like I mean, who wants to build Terminator? Right? Your hands. Who wants to build Terminator? No one. Right? We need we need to be really stupid.

But then what is the impact? Predictive search. Should we use a lens of search engines? This is what people is doing today. Right? Sadly. And then you are they are changing your world.

Well, the answer the the short answer is no. And you can read the, work of Chirag Shah, a very good friend, and Emily Bender, both from Washington. Because predictive knowledge is not knowledge.

So possible answers are not always relevant answers. More specific the answer, more probably will be wrong, like my biography. So many wrong facts.

Something that Emily Bender says. It works 95% of the time. People will trust it and then will not fact check. This is a real danger. Something that works all the time, you trust it. But imagine that you have an elevator in the building besides here that says the elevator works 99% of the time. Will you take it?

I wouldn’t. And if they tell me also a computer scientist did it, even less. Right?

So we cannot work with things that work almost all the time. That’s not science for me. That’s not engineering. That’s alchemy. We have we’re doing too much alchemy.

And also, something very important, if they work 100% of the time imagine that they work 100% of the time. That’s not true today.

They trigger laziness. They they take all the interesting part about information literacy, checking things, looking other views, learning new things.

To be human, we need to do that. So that’s why I did the other part first because this is very important to understand why it’s so crucial.

But -LLMs sorry. S and l l l missing are good for summarizing answers for expanding queries for many other things we do in search engines. So, yes, use them for that.

But let me give you examples. This is ChatGPT 4.0. So top model. Give me the map of the United States. Almost half of the states are wrong. But the good thing I asked, why Alaska is to the south? Right? I said, it’s to the south because that’s the way it represented the most. Not but it’s not to the south. It’s to the northwest. Okay. So the guess was good there.

What about New York? I like where the Statue of Liberty is because I think it’s underwater. And also the Empire State Building is underwater. This tree in Central Park is also underwater.

So this is the imagination of language models. So this is a comparison. I only think I have three times three minutes, so I will go this very fast. But this is the second paper of Cha and Bender. And these are the differences.

And here I have a very important point. Personalization. More personalization. But the personalization is scary. Last year, a colleague in Chile sent sent me this example from Google search. It’s a professor, PhD. He uses English in his computer, so the setup is English, but he’s in Chile. Look at this answer. He asked he was asking about what it meant to do to say in allusion to, which is a Latin word, Latin, etymology. So they exist also in Spanish.

But this is really, really worrisome. You might allude to a cop sitting behind you you to stop your friends your friends from discussing plans to rob a bank.

You can tie it in your phone. Of course, the example in English will be completely, non biased to a Latino. I don’t know. But this is really stereotypes.

Should we use elements as judges? Right? Another important problem in IR. Well, many experiments have shown that they align with human judgments. So why not? Well, recent work shows some problems. They are more lenient than humans. So humans are tougher. They are not reliable as a tool because if you use different LMS, you get different answers. That’s not very reliable. If you have a good human good humans, they should have the same answer. You can manipulate them.

Very recent work from DeepMind, they cannot discern statistical differences that are very subtle. So you cannot see that. But more interesting, they are biased towards LLM rankers. So in some in some way, they recognize their own BS. So a united answer will get better judgment than a human answer. Do you want that?

And finally, although this has hasn’t been corroborated by the last paper I mentioned, some people have found that there is also bias towards generated text. Not the rancher, but the test the the test text. And this may be too because they are using the same kind of neural architectures.

And then they may correlate to what they learn because they’re using the same architecture.

For the future, and almost perfectly on time, so we need to include knowledge.

So we need to include logic reasoning. And this was classic semantic search. So we need to have hybrid system that combine neural IR with the classic semantic search. And this is the same we are seeing in language models. They need to combine some kind of real reasoning and also trying to check the facts before outputting anything.

But the most difficult part, this is the third one, the common sense. In Spanish, we say that common sense is the less common of the senses. And it’s true. You you see it every day.

Understanding the context. Seeing something for the first time and you know what to do if you are smart. Right? You see an accident to the left and you try to go to the right. Right? I hope so. Or some people now take their phone to take a picture of the accident.

That’s we should not promote user laziness. So let the user do something. Don’t give them everything.

And also, we should educate users about this demonization problem. So I’m not worried about AI. I’m worried about us

Right? These are, we are the problem. AI is just a tool, and we are using it wrong in many, many ways. The creator of ELISA in 1966, after twenty years, became against AI. Well and and he was so brilliant. There’s an amazing book of, interview that the German journalist did to to him before he died. He said, we must never confuse computers with humans.

This is the main issue. It’s not about the API. It’s about getting this thing affecting us. And we wrote a very interesting paper last year that’s called human AI coevolution.

Because it’s not only how we shape AI. Sadly, it’s how AI is shaping us. That’s why we did the GPS today. That, that that’s why we use, not only ChatGPT or Gemini, but also TikTok and so on. Right? So thank you.