Table of Contents
Key takeaways
- This page gives a practical decision path for Why Google AI Overviews Get 57 Million Facts Wrong Per Hour, not just a broad overview.
- Compare the tradeoffs, requirements, and alternatives before acting on the recommendation.
- Use the related Hubkub links below to continue into the closest next topic.
Every minute, Google serves roughly 1 million incorrect answers through its AI Overviews feature. That is not an estimate — it is the mathematical result of a new independent study. Google processes more than 5 trillion searches a year, and even a small error rate becomes a massive problem at that scale. If you rely on Google AI Overviews accuracy for health questions, legal facts, or everyday information, there is a real chance the AI summary at the top of your search is wrong. Worse: the source it cites may not actually support what it claims. This deep dive explains what researchers found, why accuracy improved while trustworthiness fell, and what practical steps you can take to protect yourself from AI search errors.

Inside the Oumi Study: How Researchers Measured AI Search Errors
The research was commissioned by The New York Times and carried out by Oumi, an AI startup specializing in model evaluation. Researchers selected 4,326 Google searches drawn from SimpleQA — a benchmark of more than 4,000 verifiable factual questions developed by OpenAI in 2024. SimpleQA is widely used across the AI industry to test whether a model can answer precise, fact-checkable questions correctly.
The same queries were tested twice. The first test ran in October 2025, when AI Overviews used Gemini 2 as its underlying model. The second test ran in February 2026, after Google upgraded to Gemini 3. Results were compared to give a before-and-after picture of AI Overview accuracy on identical questions.
On the surface, improvement was clear. Gemini 2 produced factually accurate answers 85 percent of the time. Gemini 3 pushed that figure to 91 percent. For a feature used by hundreds of millions of people every day, this looked like meaningful progress in reliability.
The Scale: 57 Million Wrong Answers Per Hour
The math behind the percentages is where the problem becomes impossible to ignore. Google’s 5-trillion-search annual volume means AI Overviews handles an extraordinary number of queries every hour. At a 9 percent error rate, the Oumi analysis estimates 57 million incorrect answers per hour — roughly 950,000 wrong answers every single minute. Scale transforms a single-digit failure rate into one of the largest misinformation delivery systems ever measured by volume.
Google disputed the findings. A spokesperson said the SimpleQA benchmark itself contained incorrect data, and that benchmark testing does not reflect how users actually search. Google did not provide its own independent accuracy figures, and no counteranalysis has been published to date.
The Hidden Problem: When AI Citations Fail to Support the Claims

Beneath the headline accuracy numbers, the Oumi study found a more troubling pattern: Google Gemini hallucinations in source attribution. The researchers tracked how many correct answers were classified as “ungrounded” — meaning the cited sources did not actually support the information shown in the AI Overview.
With Gemini 2, 37 percent of correct answers were ungrounded. After the upgrade to Gemini 3, that figure jumped to 56 percent. Google’s AI improved at producing factually correct statements while simultaneously becoming worse at linking those statements to supporting evidence. More than half of the correct answers in the February 2026 test pointed to sources that did not contain or confirm the claimed fact.
This creates a compounding trust problem. When a user sees a confident AI-generated summary with a source link, the natural assumption is that clicking the link will confirm the information. In more than half of cases, it will not. Researchers described this as a verifiability gap — the AI looks trustworthy because it cites sources, but the citations are increasingly decorative rather than functional.
The situation is worsened by how rarely people check. One analysis found that only 8 percent of users verify an AI-generated answer. This phenomenon — sometimes called “cognitive surrender” — describes users deferring to AI confidence rather than applying critical judgment. When the AI speaks with authority and provides a citation, most people accept the answer without investigating further.
The categories of searches most vulnerable to spreading AI search misinformation through AI Overviews include:
- Medical symptoms, diagnoses, and drug interactions
- Legal rights, regulatory requirements, and court procedures
- Financial rules, tax thresholds, and investment products
- Current events and rapidly changing news stories
- Named statistics, historical dates, and specific factual claims
For ongoing analysis of how AI is reshaping information access and search quality, see Hubkub’s deep-dive coverage.
What AI Overviews Accuracy Means for Everyday Users
The practical consequences are already visible in policy changes. In January 2026, Google restricted AI Overviews from appearing on certain health-related searches. The move followed a Guardian investigation that documented dangerous medical misinformation in AI-generated summaries for sensitive health queries. That restriction acknowledged one specific high-stakes risk, but the broader inaccuracy issue extends across all query types.
Using Google AI Overviews more safely comes down to consistent habits, not technical workarounds. Here are five steps worth building into your daily search behavior:
- Do not stop at the AI summary. Always scroll past the AI Overview to reach actual search results. Underlying web pages often contain more nuanced or accurate information than the AI distillation.
- Click every cited source. If the AI claims a specific fact and provides a link, open the link and confirm the claim appears there. If it does not, treat the AI answer as unverified.
- Cross-reference with institutional sources. For health, legal, and financial questions, verify with a .gov, .edu, or major institutional website before acting on an AI-generated answer.
- Apply extra skepticism to specific numbers. Dates, statistics, percentages, and named facts carry the highest risk of error — these are exactly the types of questions SimpleQA tests.
- Use search operators for accuracy-critical queries. Adding
site:.govorsite:.eduto a search forces results toward verified sources and bypasses AI Overviews entirely for that query.
A full technical explanation of how SimpleQA evaluates factual accuracy — the same benchmark used in the Oumi analysis — is available at OpenAI’s original SimpleQA introduction.
Common Questions — Google AI Overviews Accuracy
Q: How accurate are Google AI Overviews in 2026?
A: According to the Oumi analysis commissioned by The New York Times, Google AI Overviews running on Gemini 3 are accurate approximately 91 percent of the time. At Google’s scale of 5 trillion annual searches, a 9 percent error rate generates an estimated 57 million incorrect answers per hour. The accuracy improved from 85 percent under Gemini 2, but the scale of errors remains enormous.
Q: What does “ungrounded sources” mean in Google AI Overviews?
A: Ungrounded means the AI Overviews links to websites that do not actually support the information shown in the summary. The Oumi study found this occurred in 56 percent of correct answers from Gemini 3 — up from 37 percent with Gemini 2. Users who click the cited link to verify a claim often find the source page does not confirm the stated fact.
Q: Can you turn off Google AI Overviews?
A: Yes. Go to Google Search Settings and look for the AI Overviews option. On desktop, navigate to Settings then Search Settings. On mobile, access it through the Google app settings. Disabling AI Overviews returns you to standard search results without AI-generated summaries appearing at the top of the page.
Q: Why did Google dispute the Google AI Overviews accuracy study?
A: A Google spokesperson stated that the SimpleQA benchmark was flawed because the dataset itself contained some incorrect factual data, and that controlled benchmark testing does not represent real-world search behavior. Google did not publish any alternative accuracy measurements to support this response.
Conclusion
The research points to three clear conclusions. First, Google AI Overviews improved under Gemini 3, reaching 91 percent accuracy. Second, that improvement masks a worsening citation problem — 56 percent of correct answers now link to pages that do not support the stated claim. Third, at Google’s scale of 5 trillion annual searches, a 9 percent error rate still delivers 57 million wrong answers per hour. Treating AI search summaries as verified facts is not a safe habit in 2026. Verify before you act, and always click before you trust.
Stay current on how AI is shaping the tools you use every day with Hubkub’s Tech News section, updated daily.
Last Updated: April 13, 2026








