Sarvam AI OpenHathi: First Hindi Large Language Model

Indian AI startup Sarvam AI has released OpenHathi-Hi-v0.1, the first Hindi large language model (LLM) in the OpenHathi series. Leveraging Meta AI’s Llama2-7B architecture, this model is positioned to deliver performance on par with the renowned GPT-3.5, specifically tailored for Indian languages.

Also Read: Google Gemini vs OpenAI ChatGPT 4

Sarvam AI has Constructed with a 48,000-token extension of Llama2-7B’s tokenizer, OpenHathi-Hi-v0.1 undergoes a meticulous two-phase training process. The initial phase focuses on embedding alignment, strategically aligning randomly initialised Hindi embeddings. The subsequent phase, bilingual language modelling, entails training the model to cross-lingually attend to tokens.

Sarvam AI proudly asserts that OpenHathi-Hi-v0.1 exhibits comparable, if not superior, performance to GPT-3.5 across various Hindi tasks while maintaining proficiency in English. This achievement signifies a significant milestone for the startup, demonstrating its prowess in advancing language models tailored for specific linguistic nuances.

Must Read: Mistral Drops OpenAI Language Model via Torrent Link

Beyond standard Natural Language Generation (NLG) tasks, Sarvam AI conducted a comprehensive evaluation of OpenHathi-Hi-v0.1’s capabilities in real-world scenarios. The company’s commitment to practical applications underscores the model’s versatility and potential impact across diverse applications.

In a notable collaboration, Sarvam AI joined forces with KissanAI to refine its base model using conversational data gathered from a GPT-powered bot engaging with farmers in different languages. This strategic partnership demonstrates the startup’s dedication to refining and enhancing OpenHathi-Hi-v0.1 through real-world interactions, contributing to its adaptability and effectiveness in dynamic linguistic environments.

Must Read: Microsoft Unveils Copilot: AI Innovations & Potential Revenue Surge

The startup, a mere five months old, has rapidly gained recognition and support in the AI landscape. Securing $41 million in a recent funding round led by Lightspeed Ventures, with contributions from Peak XV Partners and Khosla Ventures, Sarvam AI is positioned for continued growth and innovation.

To enhance OpenHathi-Hi-v0.1’s Hindi capabilities, Sarvam AI outlines steps such as reducing the fertility score of its tokenizer in Hindi text to improve efficiency. The company details the creation of a sentence-piece tokenizer from a subsample of 100K documents from the Sangraha corpus, in collaboration with AI4Bharat, resulting in a new tokenizer with a 48K vocabulary.

Sarvam AI’s commitment to linguistic diversity and practical applications, coupled with the strategic partnerships and cutting-edge technology underpinning OpenHathi-Hi-v0.1, positions the startup as a key player in advancing the landscape of large language models, particularly tailored for the nuances of Hindi and other Indian languages. As Sarvam AI continues to evolve, the unveiling of OpenHathi-Hi-v0.1 sets a promising trajectory for the future of AI-driven linguistic innovation.

Must Read: Election 2024: How Meta is Planning with $20 Billion Investment; Check Latest Social Media Guidelines

Sarvam AI OpenHathi: First Hindi Large Language Model

Sarvam AI launches OpenHathi-Hi-v0.1, the first Hindi large language model, rivalling GPT-3.5's prowess for Indic languages. Their strategic approach and collaborations signal a promising frontier in AI innovation.

Google Launched Gemini Pro AI Model for Developers and Enterprises

Why OpenAI giving $10 million in grants for Super Intelligence Development

Tech Chilli Desk

Why OpenAI giving $10 million in grants for Super Intelligence Development

Scott Wu Net Worth: Devin AI Software Engineer, CEO of Cognition Labs

Artificial Intelligence (AI) Glossary and Terminologies – Complete Cheat Sheet List

Mistral 7B Tutorial: A Step-by-Step Guide on How to Use Mistral LLM

Brain Teaser For Teenagers: Spot The Owlet With A Unique Sweater In 21 Seconds!

What is Blockchain Technology And How Does It Work?

9 Best Books On Cryptocurrency For Beginners

What is generative AI, and how does it work?

Artificial Intelligence (AI) Glossary and Terminologies – Complete Cheat Sheet List

Google CEO Sundar Pichai Advises Indian Engineers on Thriving in AI Age

Bill Gates Recommends Must-Read Book on AI and Education

Colorado Enacts Historic AI Discrimination Law to Protect Workers and Consumers

AI ‘Godfather’ Geoffrey Hinton Advocates for Universal Basic Income Amid AI Advancements

Recent News

Google CEO Sundar Pichai Advises Indian Engineers on Thriving in AI Age

Bill Gates Recommends Must-Read Book on AI and Education

Colorado Enacts Historic AI Discrimination Law to Protect Workers and Consumers

AI ‘Godfather’ Geoffrey Hinton Advocates for Universal Basic Income Amid AI Advancements

Browse by Category

Sign Up

Top Searches

Recent News

Google CEO Sundar Pichai Advises Indian Engineers on Thriving in AI Age

Bill Gates Recommends Must-Read Book on AI and Education

Colorado Enacts Historic AI Discrimination Law to Protect Workers and Consumers

AI ‘Godfather’ Geoffrey Hinton Advocates for Universal Basic Income Amid AI Advancements