News

DataGemma: Google’s Solution to Reduce AI Hallucinations Using Real-World Data from Data Commons

Google introduces DataGemma, an AI model grounded in real-world data from Data Commons to reduce hallucinations in large language models (LLMs). By integrating factual data, DataGemma enhances the accuracy and reliability of AI responses, pushing the boundaries of AI innovation.

The sophisticated large language models (LLMs) driving modern AI advancements are getting bigger. These models can sort through enormous volumes of text, produce summaries, offer fresh ideas, and even write code. Notwithstanding these remarkable abilities, LLMs occasionally provide false information with confidence. One major issue facing generative AI is the “hallucination” phenomenon.

Google revealed exciting new research findings that directly address this problem, assisting in reducing hallucinations by grounding LLMs in empirical statistical data. In conjunction with these research advances, they are thrilled to present DataGemma, the first open model created to link LLMs with copious amounts of real-world data taken from Google’s Data Commons.

Also Read: How AlphaProteo by Google DeepMind is Advancing Drug Discovery with AI-Designed Proteins

Data Commons: An extensive collection of reliable, freely accessible data

A knowledge graph called Data Commons is accessible to the general public and has over 240 billion rich data points covering hundreds of thousands of statistical variables. The United Nations (UN), the World Health Organization (WHO), the Centers for Disease Control and Prevention (CDC), and Census Bureaus are among the reputable institutions from which it obtains this public data. Policymakers, researchers, and enterprises looking for precise insights are empowered when these datasets are combined into a single, cohesive set of tools and AI models.

Imagine Data Commons as a massive, ever-growing database that contains trustworthy, publicly available information on a variety of subjects, from economics and health to demographics and the environment. You may engage with Data Commons using Google’s artificial intelligence (AI)-powered natural language interface by using your own words. You can investigate, for instance, which African nations have had the biggest increases in access to electricity, the relationship between income and diabetes in US states, or your data-driven question.

Also Read: Google Photos Launches AI-Powered “Ask Photos” for Advanced Search, Available to U.S. Users Soon

How Data Commons can assist in combating delusions

Google is attempting to ground the growing popularity of generative AI by incorporating Data Commons into Gemma, a family of cutting-edge, lightweight open models developed using the same science and technology as the Gemini models. Developers and academics can now access these DataGemma models.

DataGemma will increase the capabilities of Gemma models by leveraging Data Commons knowledge to improve LLM reasoning and factuality through two different methods:

1. RIG (Retrieval-Interleaved Generation) proactively queries reliable sources and performs fact-checking against data in Data Commons to augment the capabilities of Gemma 2, the language model. The model is configured to recognize occurrences of statistical data and receive the response from Data Commons when DataGemma is asked to provide one. Although the RIG methodology is not new, its particular use is in the context of the DataGemma architecture.

2. RAG (Retrieval-Augmented Generation) allows language models to take in more context, integrate pertinent information outside of their training data, and produce outputs that are more thorough and educational. This was made possible by utilizing the extended context window of Gemini 1.5 Pro with DataGemma. Before the model starts to generate responses, DataGemma obtains pertinent contextual information from Data Commons, reducing the possibility of hallucinations and improving response accuracy.

Also Read: Google Reactivates Gemini AI Image Function with Imagen 3 After Addressing Controversy

Although preliminary, Google’s RIG and RAG results are promising. When it comes to managing numerical information, researchers have seen significant improvements in Google’s language models’ accuracy. This implies that for use cases involving research, decision-making, or just sating curiosity, users may have fewer hallucinations.

As they scale up this work, put it through rigorous testing, and eventually integrate this expanded functionality into both Gemma and Gemini models—first through a phased, limited-access approach—Google’s research is committed to further refining these techniques.

Also Read: Google Meet’s AI-Powered Note-Taking Tool Now Available for Workspace: Auto-Summarize and Record

Through research sharing and reopening this most recent Gemma model version as an “open” model, Google hopes to support the wider adoption of these Data Commons-led methods for providing factual data to ground LLMs. To ensure that LLMs are essential tools for everyone and to create a future where AI provides people with correct information, encourages informed decision-making, and fosters a deeper awareness of the world around us, we must make LLMs more dependable and trustworthy.

This post was last modified on September 13, 2024 4:31 am

Kumud Sahni Pruthi

A postgraduate in Science with an inclination towards education and technology. She always looks for ways to help people improve their lives by putting complex things into simple words through her writing.

Recent Posts

Rish Gupta Net Worth: CEO & Co-Founder of Spot AI

Rish Gupta is an Indian entrepreneur who serves as the chief executive officer (CEO) of…

April 19, 2025

Top 10 Robotics Skills Required for Engineering Career Growth

Are you looking to advance your engineering career in the field of robotics? Check out…

April 18, 2025

Top 20 Books on AI in 2025: The Ultimate Reading List on Artificial Intelligence

Artificial intelligence is a topic that has recently made internet users all over the world…

April 18, 2025

Top 10 Best AI Communities in 2025

Boost your learning journey with the power of AI communities. The article below highlights the…

April 18, 2025

Artificial Intelligence (AI) Glossary and Terminologies – Complete Cheat Sheet List

Demystify the world of Artificial Intelligence with our comprehensive AI Glossary and Terminologies Cheat Sheet.…

April 18, 2025

Scott Wu Net Worth: Devin AI Software Engineer, CEO of Cognition Labs

Scott Wu is the co-founder and Chief Executive Officer of Cognition Labs, an artificial intelligence…

April 17, 2025