Google introduces DataGemma, an AI model grounded in real-world data from Data Commons to reduce hallucinations in large language models (LLMs). By integrating factual data, DataGemma enhances the accuracy and reliability of AI responses, pushing the boundaries of AI innovation.
DataGemma
The sophisticated large language models (LLMs) driving modern AI advancements are getting bigger. These models can sort through enormous volumes of text, produce summaries, offer fresh ideas, and even write code. Notwithstanding these remarkable abilities, LLMs occasionally provide false information with confidence. One major issue facing generative AI is the “hallucination” phenomenon.
Google revealed exciting new research findings that directly address this problem, assisting in reducing hallucinations by grounding LLMs in empirical statistical data. In conjunction with these research advances, they are thrilled to present DataGemma, the first open model created to link LLMs with copious amounts of real-world data taken from Google’s Data Commons.
Also Read: How AlphaProteo by Google DeepMind is Advancing Drug Discovery with AI-Designed Proteins
A knowledge graph called Data Commons is accessible to the general public and has over 240 billion rich data points covering hundreds of thousands of statistical variables. The United Nations (UN), the World Health Organization (WHO), the Centers for Disease Control and Prevention (CDC), and Census Bureaus are among the reputable institutions from which it obtains this public data. Policymakers, researchers, and enterprises looking for precise insights are empowered when these datasets are combined into a single, cohesive set of tools and AI models.
Imagine Data Commons as a massive, ever-growing database that contains trustworthy, publicly available information on a variety of subjects, from economics and health to demographics and the environment. You may engage with Data Commons using Google’s artificial intelligence (AI)-powered natural language interface by using your own words. You can investigate, for instance, which African nations have had the biggest increases in access to electricity, the relationship between income and diabetes in US states, or your data-driven question.
Also Read: Google Photos Launches AI-Powered “Ask Photos” for Advanced Search, Available to U.S. Users Soon
Google is attempting to ground the growing popularity of generative AI by incorporating Data Commons into Gemma, a family of cutting-edge, lightweight open models developed using the same science and technology as the Gemini models. Developers and academics can now access these DataGemma models.
DataGemma will increase the capabilities of Gemma models by leveraging Data Commons knowledge to improve LLM reasoning and factuality through two different methods:
1. RIG (Retrieval-Interleaved Generation) proactively queries reliable sources and performs fact-checking against data in Data Commons to augment the capabilities of Gemma 2, the language model. The model is configured to recognize occurrences of statistical data and receive the response from Data Commons when DataGemma is asked to provide one. Although the RIG methodology is not new, its particular use is in the context of the DataGemma architecture.
2. RAG (Retrieval-Augmented Generation) allows language models to take in more context, integrate pertinent information outside of their training data, and produce outputs that are more thorough and educational. This was made possible by utilizing the extended context window of Gemini 1.5 Pro with DataGemma. Before the model starts to generate responses, DataGemma obtains pertinent contextual information from Data Commons, reducing the possibility of hallucinations and improving response accuracy.
Also Read: Google Reactivates Gemini AI Image Function with Imagen 3 After Addressing Controversy
Although preliminary, Google’s RIG and RAG results are promising. When it comes to managing numerical information, researchers have seen significant improvements in Google’s language models’ accuracy. This implies that for use cases involving research, decision-making, or just sating curiosity, users may have fewer hallucinations.
As they scale up this work, put it through rigorous testing, and eventually integrate this expanded functionality into both Gemma and Gemini models—first through a phased, limited-access approach—Google’s research is committed to further refining these techniques.
Also Read: Google Meet’s AI-Powered Note-Taking Tool Now Available for Workspace: Auto-Summarize and Record
Through research sharing and reopening this most recent Gemma model version as an “open” model, Google hopes to support the wider adoption of these Data Commons-led methods for providing factual data to ground LLMs. To ensure that LLMs are essential tools for everyone and to create a future where AI provides people with correct information, encourages informed decision-making, and fosters a deeper awareness of the world around us, we must make LLMs more dependable and trustworthy.
This post was last modified on September 13, 2024 4:31 am
Rish Gupta is an Indian entrepreneur who serves as the chief executive officer (CEO) of…
Are you looking to advance your engineering career in the field of robotics? Check out…
Artificial intelligence is a topic that has recently made internet users all over the world…
Boost your learning journey with the power of AI communities. The article below highlights the…
Demystify the world of Artificial Intelligence with our comprehensive AI Glossary and Terminologies Cheat Sheet.…
Scott Wu is the co-founder and Chief Executive Officer of Cognition Labs, an artificial intelligence…