DataGemma: Google’s Solution to Reduce AI Hallucinations Using Real-World Data from Data Commons

The sophisticated large language models (LLMs) driving modern AI advancements are getting bigger. These models can sort through enormous volumes of text, produce summaries, offer fresh ideas, and even write code. Notwithstanding these remarkable abilities, LLMs occasionally provide false information with confidence. One major issue facing generative AI is the “hallucination” phenomenon.

Google revealed exciting new research findings that directly address this problem, assisting in reducing hallucinations by grounding LLMs in empirical statistical data. In conjunction with these research advances, they are thrilled to present DataGemma, the first open model created to link LLMs with copious amounts of real-world data taken from Google’s Data Commons.

Also Read: How AlphaProteo by Google DeepMind is Advancing Drug Discovery with AI-Designed Proteins

Data Commons: An extensive collection of reliable, freely accessible data

A knowledge graph called Data Commons is accessible to the general public and has over 240 billion rich data points covering hundreds of thousands of statistical variables. The United Nations (UN), the World Health Organization (WHO), the Centers for Disease Control and Prevention (CDC), and Census Bureaus are among the reputable institutions from which it obtains this public data. Policymakers, researchers, and enterprises looking for precise insights are empowered when these datasets are combined into a single, cohesive set of tools and AI models.

Imagine Data Commons as a massive, ever-growing database that contains trustworthy, publicly available information on a variety of subjects, from economics and health to demographics and the environment. You may engage with Data Commons using Google’s artificial intelligence (AI)-powered natural language interface by using your own words. You can investigate, for instance, which African nations have had the biggest increases in access to electricity, the relationship between income and diabetes in US states, or your data-driven question.

Also Read: Google Photos Launches AI-Powered “Ask Photos” for Advanced Search, Available to U.S. Users Soon

How Data Commons can assist in combating delusions

Google is attempting to ground the growing popularity of generative AI by incorporating Data Commons into Gemma, a family of cutting-edge, lightweight open models developed using the same science and technology as the Gemini models. Developers and academics can now access these DataGemma models.

DataGemma will increase the capabilities of Gemma models by leveraging Data Commons knowledge to improve LLM reasoning and factuality through two different methods:

1. RIG (Retrieval-Interleaved Generation) proactively queries reliable sources and performs fact-checking against data in Data Commons to augment the capabilities of Gemma 2, the language model. The model is configured to recognize occurrences of statistical data and receive the response from Data Commons when DataGemma is asked to provide one. Although the RIG methodology is not new, its particular use is in the context of the DataGemma architecture.

2. RAG (Retrieval-Augmented Generation) allows language models to take in more context, integrate pertinent information outside of their training data, and produce outputs that are more thorough and educational. This was made possible by utilizing the extended context window of Gemini 1.5 Pro with DataGemma. Before the model starts to generate responses, DataGemma obtains pertinent contextual information from Data Commons, reducing the possibility of hallucinations and improving response accuracy.

Also Read: Google Reactivates Gemini AI Image Function with Imagen 3 After Addressing Controversy

Although preliminary, Google’s RIG and RAG results are promising. When it comes to managing numerical information, researchers have seen significant improvements in Google’s language models’ accuracy. This implies that for use cases involving research, decision-making, or just sating curiosity, users may have fewer hallucinations.

As they scale up this work, put it through rigorous testing, and eventually integrate this expanded functionality into both Gemma and Gemini models—first through a phased, limited-access approach—Google’s research is committed to further refining these techniques.

Also Read: Google Meet’s AI-Powered Note-Taking Tool Now Available for Workspace: Auto-Summarize and Record

Through research sharing and reopening this most recent Gemma model version as an “open” model, Google hopes to support the wider adoption of these Data Commons-led methods for providing factual data to ground LLMs. To ensure that LLMs are essential tools for everyone and to create a future where AI provides people with correct information, encourages informed decision-making, and fosters a deeper awareness of the world around us, we must make LLMs more dependable and trustworthy.

DataGemma: Google’s Solution to Reduce AI Hallucinations Using Real-World Data from Data Commons

Google introduces DataGemma, an AI model grounded in real-world data from Data Commons to reduce hallucinations in large language models (LLMs). By integrating factual data, DataGemma enhances the accuracy and reliability of AI responses, pushing the boundaries of AI innovation.

NPCI’s Digital Payments Revenue Grows 42% to ₹3,279 Crore in FY24, UPI Leads the Charge

OpenAI’s First Campus Recruitment at UC Berkeley, Expands Emerging Talent Tech Community

Kumud Sahni Pruthi

OpenAI’s First Campus Recruitment at UC Berkeley, Expands Emerging Talent Tech Community

Top 13 Yield Farming Platforms in 2026: Maximize APY with Secure and Trusted Crypto Tools

What are 10 Largest AI Data Centers in the World?

[Updated] Top 13 NFT Discord Servers (Groups) to Join In 2025 with Channel Name

Best edX AI Courses and Certifications in 2024 (FREE and Paid)

Perplexity Campus Strategist Program 2024: How to Apply and Key Benefits

Gaurav Chaudhary Net Worth – Technical Guruji, Indian YouTuber

Best AI Development Platforms and Tools in 2026

How to Use Canva AI Tools and Features to Enhance Your Posts and Designs?

Best AI Model for Every Task: Image, Video, PPT and More

What is Agentic AI? Check How it Works with Real-Life Agentic AI Automation Examples

13 Best Free Online Vocal Remover AI Tools in 2026

Top 13 Yield Farming Platforms in 2026: Maximize APY with Secure and Trusted Crypto Tools

Recent News

Best AI Model for Every Task: Image, Video, PPT and More

What is Agentic AI? Check How it Works with Real-Life Agentic AI Automation Examples

13 Best Free Online Vocal Remover AI Tools in 2026

Top 13 Yield Farming Platforms in 2026: Maximize APY with Secure and Trusted Crypto Tools

Trending in AI

Browse by Category

Top Searches

Recent News

Best AI Model for Every Task: Image, Video, PPT and More

What is Agentic AI? Check How it Works with Real-Life Agentic AI Automation Examples

13 Best Free Online Vocal Remover AI Tools in 2026

Top 13 Yield Farming Platforms in 2026: Maximize APY with Secure and Trusted Crypto Tools