Tokenization has been one of the most talked-about topics when developing Indian language models because it varies greatly depending on the model. Tokenizer Arena has been introduced on Hugging Face by Cognitive Lab in light of this.
The arena, which is based on theTransformerJSs package, allows users to compare many tokenizers at once.
The creator and CEO of Cognitive Labs, Adithya SK, announced the tokenizer arena with a post on X, formerly Twitter.

Also Read: Meet Adithya Kolavi, a 20-year-old, who developed the Indic LLM Leaderboard
Tokenization has been one of the most talked-about topics when developing Indic language models because it varies greatly depending on the model. Tokenizer Arena has been introduced on Hugging Face by Cognitive Lab in light of this.
The arena, which is based on the TransformerJS package, allows users to compare many tokenizers at once.
Many models, including Gemma, Mistral, Grok-1, GPT-3, GPT-4, Claude, Phi-3, and Command R, are present in the arena.
This is perfect for developers attempting to overlay open source models for tokenizing on Devanagari text—a language that differs greatly from English—with Indic LLM models.
To view Tokenizer Arena, click this link.
The Indic LLM Leaderboard was recently developed by Cognitive Lab’s creator, Adithya S. Kolavi, to track the many Indic LLMs that are becoming more popular in the nation. Ambari, the first multilingual Kannada model developed on top of Llama 2, was also released by the team.
Also Read: OpenAI offers a glimpse into its AI’s secret instructions
With support for seven Indic languages—Hindi, Kannada, Tamil, Telugu, Malayalam, Marathi, and Gujarati—the Indic LLM Leaderboard offers a thorough evaluation tool. It is now hosted on Hugging Face and supports four indicator benchmarks, with intentions to add more in the future.
Adithya Kolavi has become a phenomenon in the field of artificial intelligence. His most recent project, the Indic LLM Leaderboard, is causing a stir in the AI field, and he is the founder and CEO of CognitiveLab.