Um, none of those numbers add up.
Google acting dumb matters because its AI is headed to your searches sooner or later. The company has already been testing this new Google — dubbed Search Generative Experience, or SGE — with volunteers for nearly 11 months, and recently started showing AI answers in the main Google results even for people who have not opted in to the test.
The new Google can do some useful things. But as you’ll see, it sometimes also makes up facts, misinterprets questions, delivers out-of-date information and just generally blathers on. Even worse, researchers are finding the AI often elevates lower-quality sites as reliable sources of information.
Normally, I wouldn’t review a product that isn’t finished. But this test of Google’s future has been going on for nearly a year, and the choices being made now will influence how billions of people get information. At stake is also a core idea behind the current AI frenzy: that the tech can replace the need to research things ourselves by just giving us answers. If a company with the money and computing power of Google can’t make it work, who can?
SGE merges the search engine you know with the capabilities of a chatbot. On top of traditional results, SGE writes out direct answers to queries, interspersed with links to dig deeper.
SGE is a response to the reality that some people, including me, are starting to turn to AI like ChatGPT for more complex questions or when we don’t feel like reading a bunch of different sites. Onely, a search optimization firm, estimates that using SGE can make a user’s overall research journey 10 to 20 times shorter by assembling pros and cons, prices and other information into one place.
An all-knowing answer bot sounds useful given our shrinking attention spans. But Google has a lot to work out. We expect searches to be fast, yet Google’s AI answers take a painful second or two to generate. Google has to balance the already-fragile economy of the web, where its AI answers can steal traffic from publishers who do the expensive and hard work of actually researching things.
And most of all, the new Google has to deliver on the promise that it can consistently and correctly answer our questions. That’s where I focused my testing — and kept finding examples where the AI-supercharged Google did worse than its predecessor.
Putting Google’s AI answers to the test
Often when you’re Googling, what you really want is a short bit of information or a link. On a day-to-day basis, the new Google is often annoying because its AI is so darned chatty.
A goofy example: “What do Transformers eat?”
The AI answer told me that fictional robots don’t really need to eat or drink, though they need some kind of fuel. Meanwhile, old Google had the one-word answer I was looking for: Energon. (It’s a kind of magical fuel.) You got that answer from new Google only by scrolling down the page.
This doesn’t just happen with alien robots. When SE Ranking, a firm dedicated to search engine optimization, tested SGE with 100,000 keyword queries, it found the average answer it generated was 3,485 characters — or roughly a third as long as this column. One of Google’s challenges is figuring out when its AI is better off just keeping quiet; sometimes, SGE asks you to press a “generate” button before it will write out an answer.
Most of all, when we search, we expect correct information. Google claims SGE has a leg up on ChatGPT because its knowledge is up-to-date.
Yet I found the new Google still struggled with recent affairs. Three days after the most recent Academy Awards, I searched for “Oscars 2024.” It told me the Oscars were still to come and listed some nominees.
And nothing undermined my trust in Google’s AI answers more than watching it confidently make stuff up.
That includes facts about yours truly. I asked it about an award-winning series I wrote for The Washington Post, and it attributed it to some stranger — and then gave a link to some other website.
Then there was the time SGE all too happily made up information about something that doesn’t even exist. I asked about a San Francisco restaurant called Danny’s Dan Dan Noodles, and it told me it has “crazy wait times” and described its food.
The problem is that this is an imaginary shop I named after my favorite Chinese dish. Google’s AI had no problem inventing information about it.
So-called hallucinations about real and fake topics are a known problem with current AI. A disclaimer above SGE results says, “Generative AI is experimental,” but that doesn’t solve the problem. Google needs to figure out how to say “I don’t know” when it isn’t confident.
To give us answers to everything, Google’s AI has to decide which sources are reliable. I’m not very confident about its judgment.
Remember our bonkers result on Zuckerberg’s net worth? A professional researcher — and also regular old Google — might suggest checking the billionaires list from Forbes. Google’s AI answer relied on a very weird ZipRecruiter page for “Mark Zuckerberg Jobs,” a thing that does not exist.
In my tests, suspect sources were a pattern. At the suggestion of Onely, I asked the new Google which was more reliable: Apple iPhones or Samsung phones. As a longtime reviewer, I could tell you lots of good sources of information on this, including professional journalists and repair organizations like iFixit.
Instead, the AI cites random views of people pulled from social media. Beyond the limited usefulness of a single Reddit user’s experience, how does Google know that it wasn’t a fake review posted by the phone maker?
“Google SGE plays by a different set of rules compared to the traditional search engine we know today,” said Tomek Rudzki, Onely’s head of research and development.
SEO firms have been trying to do quantitative studies of SGE’s values, though they’re limited by Google’s requirements on test accounts. But they’ve found a similar pattern in the disconnect between the sitesthat the old and new Google link to. SEO software company Authoritas tested searches with a thousand shopping terms in late March, and found that 77 percent of the time, the domain of the No. 1 traditional search result showed up nowhere in the AI-written answer.
And in its study of 100,000 keyword searches, SE Ranking found that question-and-answer service Quora is the most-linked source by SGE; LinkedIn and Reddit were fifth and sixth. How often would those sources be acceptable on an eighth-grade term paper?
On searches about tech topics — including lots of “how to” questions — SE Ranking found the most-linked domain was simplilearn.com. I’d never heard of it before; the site describes itself as an “online boot camp.”
“This trend not only diminishes the quality of search results but also reduces traffic and revenue for many small businesses, including affiliate websites,” says SE Ranking’s head of SEO, Anastasia Kotsiubynska.
Google says SGE is an opt-in experiment. But Google already blew past its expected end last December, and it hasn’t offered any update on when it will come to search for everyone. It’s possible that Google doesn’t think SGE is accurate or fast or profitable enough and that it will end up changing it dramatically.
They are wise to go slow, even if it makes Google look as though it’s behind in the AI race. Rival search engine Bing from Microsoft made a similar AI overhaul in February 2023, but its AI is still best known for going off the rails.
In an interview, Elizabeth Reid, a Google vice president leading SGE, characterized it as a work in progress.
“We’re really focused on ensuring we get the experience really right. There are a lot of different factors on this — things like latency, accuracy, helpfulness,” Reid said. “What we’ve been finding as we’re iterating and learning is that it’s pretty nuanced.” In other words, there are times the AI is helpful and other times it’s not — and Google is still trying to figure out where to draw the line.
When I shared the examples in this column, Reid told me that SGE’s hallucination rates are “very low” and have decreased “meaningfully” since SGE’s May launch, though she declined to be specific.
“I don’t want to minimize it — it is a challenge with the technology” and something “we’re really working on,” Reid said. Putting links right next to the AI answers, she added, is important to enable people to check the facts for themselves.
Here’s a proposal: Because Google acknowledges correct facts are a problem, it ought to disclose its own data on accuracy before it brings SGE to a broader audience. With billions of searches daily, even 0.001 percent can add up to a lot of wrong information.
Another area of Google’s focus is “trying to help ensure that we get to the core of the question as quickly as possible, and then give additional elaboration,” Reid said.
As for citing low-quality sources, Google disputed the outside research on SGE, saying it is based on searches that are more limited than what Google sees in practice. But it declined to share data of its own.
Reid said SGE doesn’t have a different standard than old Google. “We do see more diversity of sources that are coming forth. But the aim is really to continue to put high quality content at the top,” she said.
Choosing who to believe is hard enough for humans. What makes Google think its current AI tech, known as LLMs, or large language models, is up to the task?
“They’re not perfect,” Reid said. “We want to take this thoughtful approach because the brand of trust that people have with Google is really important.”
The future of our information depends on it.