News

DataGemma: Google’s Solution to Reduce AI Hallucinations Using Real-World Data from Data Commons

Google introduces DataGemma, an AI model grounded in real-world data from Data Commons to reduce hallucinations in large language models (LLMs). By integrating factual data, DataGemma enhances the accuracy and reliability of AI responses, pushing the boundaries of AI innovation.

The sophisticated large language models (LLMs) driving modern AI advancements are getting bigger. These models can sort through enormous volumes of text, produce summaries, offer fresh ideas, and even write code. Notwithstanding these remarkable abilities, LLMs occasionally provide false information with confidence. One major issue facing generative AI is the “hallucination” phenomenon.

Google revealed exciting new research findings that directly address this problem, assisting in reducing hallucinations by grounding LLMs in empirical statistical data. In conjunction with these research advances, they are thrilled to present DataGemma, the first open model created to link LLMs with copious amounts of real-world data taken from Google’s Data Commons.

Also Read: How AlphaProteo by Google DeepMind is Advancing Drug Discovery with AI-Designed Proteins

Data Commons: An extensive collection of reliable, freely accessible data

A knowledge graph called Data Commons is accessible to the general public and has over 240 billion rich data points covering hundreds of thousands of statistical variables. The United Nations (UN), the World Health Organization (WHO), the Centers for Disease Control and Prevention (CDC), and Census Bureaus are among the reputable institutions from which it obtains this public data. Policymakers, researchers, and enterprises looking for precise insights are empowered when these datasets are combined into a single, cohesive set of tools and AI models.

Imagine Data Commons as a massive, ever-growing database that contains trustworthy, publicly available information on a variety of subjects, from economics and health to demographics and the environment. You may engage with Data Commons using Google’s artificial intelligence (AI)-powered natural language interface by using your own words. You can investigate, for instance, which African nations have had the biggest increases in access to electricity, the relationship between income and diabetes in US states, or your data-driven question.

Also Read: Google Photos Launches AI-Powered “Ask Photos” for Advanced Search, Available to U.S. Users Soon

How Data Commons can assist in combating delusions

Google is attempting to ground the growing popularity of generative AI by incorporating Data Commons into Gemma, a family of cutting-edge, lightweight open models developed using the same science and technology as the Gemini models. Developers and academics can now access these DataGemma models.

DataGemma will increase the capabilities of Gemma models by leveraging Data Commons knowledge to improve LLM reasoning and factuality through two different methods:

1. RIG (Retrieval-Interleaved Generation) proactively queries reliable sources and performs fact-checking against data in Data Commons to augment the capabilities of Gemma 2, the language model. The model is configured to recognize occurrences of statistical data and receive the response from Data Commons when DataGemma is asked to provide one. Although the RIG methodology is not new, its particular use is in the context of the DataGemma architecture.

2. RAG (Retrieval-Augmented Generation) allows language models to take in more context, integrate pertinent information outside of their training data, and produce outputs that are more thorough and educational. This was made possible by utilizing the extended context window of Gemini 1.5 Pro with DataGemma. Before the model starts to generate responses, DataGemma obtains pertinent contextual information from Data Commons, reducing the possibility of hallucinations and improving response accuracy.

Also Read: Google Reactivates Gemini AI Image Function with Imagen 3 After Addressing Controversy

Although preliminary, Google’s RIG and RAG results are promising. When it comes to managing numerical information, researchers have seen significant improvements in Google’s language models’ accuracy. This implies that for use cases involving research, decision-making, or just sating curiosity, users may have fewer hallucinations.

As they scale up this work, put it through rigorous testing, and eventually integrate this expanded functionality into both Gemma and Gemini models—first through a phased, limited-access approach—Google’s research is committed to further refining these techniques.

Also Read: Google Meet’s AI-Powered Note-Taking Tool Now Available for Workspace: Auto-Summarize and Record

Through research sharing and reopening this most recent Gemma model version as an “open” model, Google hopes to support the wider adoption of these Data Commons-led methods for providing factual data to ground LLMs. To ensure that LLMs are essential tools for everyone and to create a future where AI provides people with correct information, encourages informed decision-making, and fosters a deeper awareness of the world around us, we must make LLMs more dependable and trustworthy.

This post was last modified on September 13, 2024 4:31 am

Kumud Sahni Pruthi

A postgraduate in Science with an inclination towards education and technology. She always looks for ways to help people improve their lives by putting complex things into simple words through her writing.

Next OpenAI’s First Campus Recruitment at UC Berkeley, Expands Emerging Talent Tech Community »

Previous « NPCI’s Digital Payments Revenue Grows 42% to ₹3,279 Crore in FY24, UPI Leads the Charge

Published by

Kumud Sahni Pruthi

September 13, 2024 4:30 am

Crypto

How Will Artificial Intelligence (AI) Transform the Crypto Industry?

Artificial Intelligence is transforming the cryptocurrency industry by enhancing security, improving predictive analytics, and enabling…

May 30, 2025

DataGemma: Google’s Solution to Reduce AI Hallucinations Using Real-World Data from Data Commons

Data Commons: An extensive collection of reliable, freely accessible data

How Data Commons can assist in combating delusions

Recent Posts

Top 13 Vibe Coding AI Tools You Need to Know for Apps, Website Building

Explained: What is Digital Arrest?

AI in Cybersecurity [2025]: Benefits, Examples, and How it is Transforming its Future

Best AI Security Solutions in 2025

What Are Autonomous AI Agent Layers?

How Will Artificial Intelligence (AI) Transform the Crypto Industry?

DataGemma: Google’s Solution to Reduce AI Hallucinations Using Real-World Data from Data Commons

Data Commons: An extensive collection of reliable, freely accessible data

How Data Commons can assist in combating delusions

Related Post

Recent Posts

Top 13 Vibe Coding AI Tools You Need to Know for Apps, Website Building

Explained: What is Digital Arrest?

AI in Cybersecurity [2025]: Benefits, Examples, and How it is Transforming its Future

Best AI Security Solutions in 2025

What Are Autonomous AI Agent Layers?

How Will Artificial Intelligence (AI) Transform the Crypto Industry?