Why Have AI Search Engines Failed at Bing & Google?

The last time you had to find a plumber, look up a famous person’s birthdate, or figure out how to reset your phone, where’d you go to find the information?

You went to ChatGPT to get the answer, right? Or maybe you went to Bing and asked Copilot for help. If not, you probably used one of Google’s new AI products, like Gemini or SGE.

Whatever you did, there’s no way you opened Google and used it like a regular search engine.

Using Google Search in a post-ChatGPT world? That would be unthinkable. 

After all, I was told that AI chatbots would destroy traditional search engines in just one or two years

And that was 15+ months ago…

In early 2023, it was taken for granted that AI chatbots would quickly transform search engines – or replace them completely.

Despite this, the search landscape of early 2024 looks oddly similar to the search landscape of early 2023. We were promised an AI search revolution. Instead, we’re still using Google and its list of blue links.

How could that be?

In this post, I’ll revisit some of the predictions people made about AI search a year ago, explain why AI search engines failed to take off in the past 12 months, and explore how this echoes the voice search trend of the mid-2010s.

Along the way, I’ll show you how SGE and Copilot hold up as search engines, and why they’re nowhere close to replacing Google Search.

In this article:

Key Takeaways
  • Generative AI was supposed to radically transform search engines, or replace them completely.
  • It’s one year later, and searchers have mostly ignored AI search engines. What gives?
  • Generative AI is a technological marvel, but it adds limited (or negative) value to most search results.
  • If someone tells you conversational AI is the future of search, ask yourself: “When’s the last time I wished Google was more like a chatbot?

AI Search One Year Later

When OpenAI made ChatGPT 3.0 public in late November 2022, it quickly became the biggest tech story of the year.

AI was going to replace everyone from lawyers to artists to influencers. It would transform every industry, create a limitless economy, and power the biggest productivity increase in history. That, or it would destroy the world.

Along the way, AI chatbots were poised to either radically transform search engines or make them a thing of history.

What People Predicted

Tech Experts: “You Will Never Go to a Search Site Again”

With the release of ChatGPT, many experts within the tech industry were quick to predict that it would revolutionize search.

In fact, many of them claimed that AI would replace search completely.

Bill Gates, the former CEO of Microsoft, said that AI would result in a world where “you will never go to a search site again, you will never go to a productivity site, you’ll never go to Amazon again.”

Prabhakar Raghavan, the SVP in charge of Google Search, echoed this, predicting that “10 years from now, we’ll all do everything through chatbots and LLMs.”

And Satya Nadella, CEO of Microsoft, said the revolution was primed to begin, claiming,  “AI will fundamentally change every software category, starting with the largest category of all — search.”

SEOs: “Search Is About to Radically Change”

In the SEO community, influencers and experts made similar predictions.

They said that search was about to radically changeThey claimed that most if not all informational queries would soon be answered by AI. They declared generative AI one of the biggest, most disruptive, most monumental changes to ever hit the search industry.

For a number of SEOs, the biggest concern was how AI content would lead to more SEO spam and make search results worse (which, unsurprisingly, became a very real problem). Others worried that AI itself might replace SEOs and posed a threat to the SEO industry.

But for many, the biggest implication of ChatGPT was that AI would completely change both how search engines work and how people use them.

Reporters: “ChatGPT Is a ‘Google Killer’”

Major publications were eager to report on the coming change to search engines. Just take a look at these headlines from December 2022 to May 2023:

  • ChatGPT and Other Chat Bots Are a ‘Code Red’ for Google Search (New York Times)
  • The Chatbots Are Coming for Google (Bloomberg)
  • Microsoft is beating Google at its own game (Vox)
  • ‘Google killer’ ChatGPT sparks AI chatbot race (BBC)
  • A Planet Without Google Search (CNET)
  • Is Google’s 20-year dominance of search in peril? (The Economist)
  • When Will ChatGPT Replace Search? Maybe Sooner Than You Think (PCMag)
  • The AI takeover of Google Search starts now (The Verge)

To their credit, many journalists and experts were skeptical that ChatGPT was, in fact, a “Google Killer.” But even the doubters assumed that AI would replace traditional search. They just assumed that Google would add its own version of ChatGPT to search results to maintain its monopoly.

In the end, the message was the same: ChatGPT – or something like it – was the future of search.

What Actually Happened

It’s been a little over a year since Microsoft added ChatGPT to Bing Search – a move that many expected would kick off the AI search revolution.

Since then, the search landscape has remained – more or less – radically unchanged.

According to StatCounter, Bing’s global market share went from 2.86% in March 2023 to 3.35% in March 2024. In other words, Bing gained less than 0.5 percentage points of market share after adding ChatGPT to search results.

A chart showing worldwide market share for search engines from March, 2023, to March, 2024.

In recent months, Bing has added new AI features like Deep Search. But early reviews have been less than effusive.

Meanwhile, OpenAI has announced that it’s developing its own AI-powered search engine, separate from Bing.

But in a recent interview, OpenAI CEO Sam Altman struggled to articulate how this search engine would actually work, saying, “I don’t think anyone has cracked the code on [that] yet.”

So much for Microsoft and OpenAI. What about Google?

In May 2023, Google announced Search Generative Experience (SGE), a generative AI version of Google Search. But unlike Bing Chat, SGE wasn’t made public. Instead, Google kept it in Search Labs – an opt-in platform for experimental search features.

Despite article after article after article saying that SGE was on the horizon, SGE remained in Search Labs month after month after month.

Today, Google’s Search results still look, feel, and function more or less the same as they did in 2023.

Google search results for 'AI search' in March, 2024.

In the past few weeks, there have been signs that SGE features will finally make their way into Google Search. Liz Reid, who spearheaded SGE, has been named Google’s new head of Search, and Google has started testing SGE-style AI overviews in U.S. search results.

But as someone who’s been using SGE for months, I’d be shocked if Google starts using SGE widely in search results any time soon.

That’s because, despite nearly a year of development, SGE offers a comparable or worse search experience across most Google searches.

While it might start appearing in a limited set of search results, SGE simply isn’t ready for the broad range of queries that Google Search is built to answer.

Why AI Search Engines Haven't Met Expectations

At this point, it’s clear that users aren’t flocking to replace Google with conversational AI. 

Hundreds of millions of people have tried ChatGPT and Copilot. Yet only a fraction are using these tools as a substitute for Google.

It’s also clear that, despite public and investor pressure to stay ahead in the AI search arms race, Google isn’t rushing to add generative AI to its search results.

That’s a far cry from the picture painted a year ago. 

So what explains this disconnect?

Below, I’ll go over the four main reasons I think so many people got this issue so wrong:

  • ChatGPT redefined the term “AI search” in an unhelpful way. People started to use “AI” and “generative AI” interchangeably, which ignored other AI technologies in search engines, including technologies already in use by Google. 
  • Tech and SEO experts overestimated what generative AI can do. When ChatGPT was released, the technology was so impressive that people assumed it had powers that simply weren’t there.
  • People expected AI search engines to be conversational. A lot of people got caught up in the potential of conversational search. But they didn’t think through some of the obvious drawbacks.
  • Generative AI adds minimal (or negative) value to most search results. In most cases, generative AI produces comparable or worse results than Google, while making searches slower and less convenient for users.

1. ChatGPT Redefined “AI Search” (in a Bad Way)

The conversation around “AI-powered” search engines can be a bit confusing (and a bit frustrating) for anyone familiar with how Google works.

That’s because Google has been using AI in its search engine for the better part of a decade. In fact, Google Search has relied on AI since 2015, when it introduced RankBrain. In the following years, Google added a range of AI systems to search, including neural matching, BERT, and MUM

In other words, Google was already an AI-powered search engine before the arrival of ChatGPT. 

But after the release of ChatGPT, people’s definition of an AI-powered search engine changed. It wasn’t enough for a search engine to be powered by AI behind the scenes. Instead, people expected AI search engines to feature generative AI front and center.

If people better understood how Google was already using AI, they might have recognized that ChatGPT wasn’t as far ahead in the technological arms race. 

And once they made that connection, they might have tempered their expectations about how much better a ChatGPT-powered search engine would be compared to Google.

2. Experts Overestimated Generative AI

In the first few months after ChatGPT’s release, expectations for what generative AI could accomplish were unrealistic at best and fantastical at worst.

One of the most confusing examples of this was people who believed that AI could solve SEO spam by delivering better results than SEO content.

If you want to fix SEO spam on Google, generative AI is the last place you should be looking. That’s because generative AI fundamentally operates like bad SEO. It takes widely available information, repackages that information into familiar, commonly used structure, and rephrases the wording so that it sounds original.

What’s more, because generative AI works by repackaging existing data, it’s fundamentally incapable of producing new information. If you want a generative AI search engine to tell you what time the Super Bowl starts or who’s the richest person in the world, that information needs to be available somewhere online. 

Because it can’t add new information to search results, generative AI’s only advantage in search is how it presents information. And that advantage is non-existent in most cases.

In 2023, people were so impressed by the novelty of ChatGPT, they became convinced that generative AI had the power to improve search results and get rid of SEO spam. In reality, generative AI is built to make the same stale, repetitive content that people claim is ruining the internet.

3. Conversational Search Is Bad for User Experience

When’s the last time you thought, “I wish I could have a nice, long chat with Google”? 

Is the answer, “Never”? 

Neat! You’ve just figured out why conversational search is a non-starter for search engines.

In late 2022 and early 2023, many people took it for granted that conversational search would make search engines better. But in reality, conversational search offers a worse user experience.

Simply put, Google is designed to help users find answers as quickly and easily as possible. That starts with the words and phrases that people use to make searches. 

Instead of asking Google “How much money does Taylor Swift have?” or “What is the best Android phone available today for under $500?” users can search for “taylor swift net worth” or “best android phone under $500.” 

For most searches, it’s quicker and easier to search using keywords. This was true when Google launched in the late 1990s. And it’s even more true today now that 63% of searches take place on smartphones.

Despite this, OpenAI and Microsoft have placed their bets on a conversational model. When people start using ChatGPT or Copilot, most of them follow the conversational prompts built into the interface. 

Those prompts make search less convenient for users – and that’s before they even get to the results.

To its credit, Google understood the pros and cons of conversational search when they began testing SGE. That’s why SGE avoids speaking to users in the first person and encourages users to continue searching with keywords.

But in late 2022 and early 2023, conversational search was pitched as a clear advantage of ChatGPT. People earnestly predicted that conversational search would be more user-friendly, helping chatbot-style search engines gain an edge on Google. 

In reality, a chatbot interface only makes search more annoying and time-consuming for everyday users.

4. Generative AI Fails to Improve (Most) Search Results

If conversational search wasn’t the game-changer some people claimed, the new wave of AI search engines would live or die by the quality of their results.

The problem is, generative AI doesn’t actually add value to most search resultsIn most cases, the results are comparable to Google’s existing results, at best. 

To make matters worse, AI search results offer a noticeable downgrade compared to Google for many types of queries. And that’s before taking longer load times and increased search costs into consideration.

In the next section, I’ve put together four example queries, representing some of the most common types of searches people make on Google. 

I’ll go through each one, show you how Google, SGE, and Copilot handle the results, and explain why I think AI search is further behind  than many people realize.

(If you don’t feel like reading through a detailed analysis of individual queries, you can skip ahead to the end.)

Google vs. SGE vs. Copilot for 4 Example Queries

"Ada Lovelace"

The Query:

To start, let’s look at am informational query for “ada lovelace.” 

In this case, the searcher knows the name of the subject, and they want to learn more about who, what, or where the subject is. Some searchers might be looking for a single biographical detail about the subject (like a birthdate). Others might be looking for a comprehensive biography.

Google Search:

Google’s existing search results address a wide range of search intents via the Knowledge Graph and a mix of search results. Users can read a short bio of the subject and find biographical details within Google’s results, or they can click through to videos and articles with longer biographies.

AI Search:

In Google SGE, the AI results address a much narrower range of intents, providing a short text biography of the subject. Unlike Google’s original results, it’s harder to find basic biographical details (like date of birth and death), and more detailed results are buried underneath the AI content.

Copilot’s results are similar to SGE, but with more detail and improved formatting. The results start with a short text biography, followed by a longer, bullet point biography broken out into subheadings. This results in a more digestible biography of the subject. But as with the SGE results, the result only makes sense if users are looking specifically for a bite-size biography.

The Verdict:

In this case, you can make a genuine argument for Copilot. The results are easier to digest than a Wikipedia page and offer key details about the subject. If you want to learn everything you can about Ada Lovelace in two or three minutes, Copilot has you covered.

But Copilot’s results only outperform Google for a narrow band of search intents. If users are looking for a one-sentence biography, quick details, or a detailed biography, they’ll find them more quickly – and more easily – through Google.

To me, this feels like a case where generative AI does one thing better than Google, but it only does that one specific thing. While Google’s results are slightly worse for this specific use case, Google also manages to satisfy a range of other use cases.

That’s critical for searches like this one where search intent isn’t always clear. And it’s representative of why SGE and Copilot still fall far short of Google. Conversational AI does one or two things really well, but a modern search engine needs to be a Swiss Army Knife for different types of searches.

"How to Reset iPhone"

The Query: 

In this case, the user is looking for specific, step-by-step instructions. Depending on the complexity of the task, users might be looking for a short list of steps, or they might be looking for detailed, comprehensive instructions.

Google Search:

Google’s results include a featured snippet with condensed, step-by-step instructions, plus a link to the article with more detail/information. The snippet covers the basic steps that users need to follow, and makes it easy to find more information via Apple’s support page.

AI Search:

Both Google SGE and Copilot convey the same information, but with the results restructured into a list format. While the AI results are somewhat easier to read, the links to supporting articles are harder to access on both versions.

The Verdict: 

This is another case where the value added by generative AI feels marginal at best. The snippets themselves are easier to read, but they don’t add new information, and they make it harder to click through to the original instructions.

And in this particular case, which results would you trust more: Instructions written and supported by Apple? Or paraphrased instructions from an AI chatbot?

These types of informational queries are where AI is supposed to shine. But when the quality difference is this marginal, how do you justify the added costs and UX issues that come with generative AI?

To put it another way: if the quality of the results is the same or slightly worse, why are we switching to a format that takes more time to load, costs search engines more money to run, and leads to less visually rich search results?

“Best Tablets 2024”

The Query: 

Next, we’ll take a look at one of the most common types of queries for product research: “best [product + year]”. In this case, we’ll look at how Google and AI alternatives handle “best tablets 2024.”

Google Search: 

Unlike the previous examples, Google’s results don’t include a Featured Snippet for this query. Instead, the results go straight to the traditional blue links from trusted sources like TechRadar and Tom’s Guide. The linked articles include product photos, specifications, pros and cons, and review summaries for each product, plus links to detailed reviews by actual experts.

AI Search:

The SGE results start with a wildly unhelpful two-item list of things to consider when buying a tablet. It then includes a list of product snippets generated from Google Shopping listings. There’s little information to help users choose the best product, and there’s even less information about why these specific products are being recommended.

Microsoft Copilot does a better job than SGE at explaining its recommendations to users, including “Why It’s Great” and “Who It’s For” summaries. 

Despite this, the Copilot summaries are too short to help users make a purchase decision, they lack critical elements like photos and product specifications. And as with SGE, it’s difficult to tell why Bing chose these specific products or listed them in this order.

The Verdict: 

I’m not going to sugarcoat it: The AI results for this query are bad. 

Heck, I’ll pour salt in the wound: These AI results are flat-out terrible.

Here, adding generative AI actively makes the search results worse. Users don’t get the information they need to make a purchase decision, and the AI results make it harder for users to find the information they actually want.

As far as I can tell, the only real use case of these results is when someone needs to make a purchase decision in less than five minutes, and they don’t particularly care if they get good or bad advice. 

To be fair, that’s a common enough use case. After all, not every purchase is as expensive or important as a tablet. Maybe you just need a quick answer about which floor cleaner to buy or which earbuds you should get for under $25. 

But at that point, how is generative AI any more helpful than whatever already ranks on Google? If SGE and Copilot are simply regurgitating what’s already online, why should I trust their results more than whatever’s at the top of Google’s list of blue links?

“Plumbers Near Me”

The Query: 

We’ll wrap things up by looking at a local search query. For this one, let’s go with a classic: “plumbers near me.”

Google Search: 

Google’s results for this query are what’s known as Google’s “Map Pack.” These results are one of the biggest success stories of Google’s 20+ year history. That success comes not only from the results themselves, but how those results are presented. Through the Map Pack, users can see where businesses are located on a map, scan critical details like user ratings and business hours, and even call businesses directly from search results.

AI Search:

The SGE results, by comparison, are far less useful to searchers. The results include an unnecessary text introduction (“Denver, Colorado, has many plumbers…”), followed by a confusing mix of text-only and card-style business listings. What’s more, the listings themselves include significantly less information than the listings in the Map Pack.

The Copilot results meanwhile, are a text-only version of Bing Maps results. I mean that literally – Copilot’s rankings are identical to Bing Maps, and the information is copied directly from Bing Maps business listings. But unlike the Map Pack (or the Bing equivalent), Copilot gives you less information in a much less useful format.

The Verdict: 

If you thought the “best tablet 2024” results were grim, these results set the bar even lower.

In early 2023, many people – including some SEOs – earnestly believed that LLMs and generative AI would lead to better, more accurate local search results. But when SGE and Copilot try their hand at local search, we end up with a much less useful interface and data scraped directly from Google Maps and Bing Maps.

To my mind, this is a near-perfect illustration of what people got wrong about generative AI and search engines.

While LLMs can access and process millions of data points – including details from business listings, content on business websites, and real-life feedback from user reviews – Google already uses this data to generate its local search rankings. Whatever advantages LLMs have in search, they aren’t particularly suited to local rankings.

And if LLMs can’t generate better local results than Google, they definitely aren’t presenting those results in a more user-friendly way. I promise you: no one who’s making a local search in 2024 is looking for a text-only version of Bing Maps.

Sure, there might be ways to use generative AI within the Google Map Pack interface or harness LLMs within a larger local ranking algorithm. 

But if you expect ChatGPT to reinvent local search results, you’re going to be waiting a long time to fix that leak under your sink.

"But What About Longtail Keywords?"

At this point, you might be wondering: What about longtail keywords? Aren’t these types of queries where AI search engines are supposed to shine? 

For example, what happens if we take our search for “best tablets 2024” and update it to “best tablets 2024 under $300 for streaming”?

Somehow, the AI results get worse. 

When I tested this query on SGE, the “What to look for…” suggestions were nearly identical to its suggestions for a basic tablet. 

Meanwhile, the product listings included products that received poor or middling by trusted publications. 

What’s more, the listings included product callouts for tablets that are “Good for note taking” and “Good for reading” – but nothing for streaming.

SGE results for 'best tablets 2024 under $300 for streaming' highlighting irrelevant features

To make matters even worse, this is the exact type of query that Google claimed SGE was built to address.

Copilot’s results for the same query were slightly better. The products it listed were more relevant, and each product summary included at least one or two details related to streaming.

Copilot recommendations for 'best tablets 2024 under $300 for streaming'

But Copilot’s results still didn’t feel like they gave me enough information to confidently buy a new tablet. 

What’s more, when I checked the sources Copilot used to generate its results, 4 out of 5 of its sources were straight-up SEO spam.

Copilot sources for 'best tablets 2024 under $300 for streaming', four of which are SEO spam

In my line of work, we’d call this a serious E-E-A-T issue. In the real world, you might call it an ouroboros of SEO spam.

Conclusion

When people talk about the future of AI search engines, it reminds me of how people talked about voice search 6 to 8 years ago.

In the mid-2010s, voice search was supposed to be the next big thing in search. Experts said more than 50% of all searches would be conducted via voice by 2020. Analysts claimed the technology would transform the way we search and use the web. SEOs urged brands to optimize for voice search or risk getting left in the dust.

None of that was true.

Voice search is still with us in 2024. But it hasn’t radically transformed the way most of us search for information online. Today, it’s just another tool in the broader search landscape. And despite every trend piece predicting otherwise, it’s an afterthought for most brands and digital marketers.

Voice search and conversational AI share a lot in common. But to me, the biggest similarity is this: They work the way people imagine search engines should work. 

When you ask people to dream up a futuristic search engine, most people think of an all-knowing machine that responds to full questions and speaks in full sentences. That’s exactly how voice search and conversational AI are supposed to function.

The problem is, there’s a big disconnect between how people expect technology to work and how technology actually gets used in the real world. 

Just because something looks, feels, and functions like science fiction, that doesn’t mean it’s the wave of the future. After all, we managed to invent flying cars – it just turned out they weren’t a practical way to get most people from point A to point B.

AI might well be the future of search. But that process will happen gradually. More importantly, LLMs and generative AI will need to come a long way before they’re ready to do the job of traditional search engines.

In the meantime, if someone tries to tell you AI chatbots are going to kill Google, just go ahead and send them here.

Want to create a roadmap for the digital future of your brand? Get in touch with the SEO experts at Session Interactive to find out how we can help.

SHARE
SHARE

RELATED POSTS