Voxtral: Open Source AI Audio Model—Capabilities, Features, and How to Access

Mistral AI has just released Voxtral, an open-source audio AI model designed to power businesses in transcription, comprehension, summarization, and voice-driven command execution.

Launched on July 15, 2025, Voxtral comes in two configurations: Voxtral Small, featuring 24 billion parameters for cloud-scale applications, and Voxtral Mini, with 3 billion parameters for on-device and edge deployments.

Both variants handle a 32k-token context window, allowing for transcription that spans up to 30 minutes and comprehension tasks that can run for 40 minutes.

Built atop Mistral’s Small 3.1 language model, Voxtral integrates Q&A, multilingual summarization, support for languages like English, Hindi, French, Spanish, and more, and direct function binding to spoken input.

It is released under the Apache 2.0 license; users can download the model from Hugging Face or query it through Mistral’s API, with pricing that starts at $0.001 per minute, less than half the cost of comparable proprietary solutions.

What Is Voxtral AI?

Voxtral AI is an open-source audio intelligence model released by Mistral AI on July 15, 2025. It offers two versions: Voxtral Small, with 24 billion parameters, designed for production-level tasks, and Voxtral Mini, with 3 billion parameters, suitable for edge or local environments.

Both are based on the Mistral Small 3.1 LLM, combining speech processing with deep text understanding. Voxtral can handle 30 minutes of audio transcription or 40 minutes for comprehension, using a large 32K token context window.

It supports multiple languages, automatic language detection, Q&A, summarization, and even function calling directly from speech.

How Voxtral Works: Core Technology and Features

Voxtral uses a unique architecture that merges an audio encoder with a language model decoder. It reads speech using an audio encoder and adapter, transforming it into a textual representation. Then Mistral Small 3.1 processes that text to understand context, answer questions, summarise, or trigger commands.

The model’s 32K-token context window enables it to process long audio files, allowing for up to 30 minutes of transcription or 40 minutes of comprehension. Built-in Q&A and summarisation features eliminate the need to chain separate ASR and LLM models.

Voxtral supports English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian, and more, automatically detecting language. Voice-enabled function calling means spoken commands can directly trigger workflows or API calls.

Why Voxtral Stands Out: Open, Scalable, and Cost-Effective

Voxtral is a breakthrough in open-source speech AI. Licensed under Apache 2.0, businesses can self-host it with complete control.

Mistral also offers an API at a rate of only $0.001 per audio minute, which is claimed to be less than half the cost of major proprietary alternatives, such as OpenAI Whisper or ElevenLabs Scribe.

In performance benchmarks, Voxtral outperforms Whisper Large-v3, GPT-4o Mini Transcribe, Gemini 2.5 Flash, and ElevenLabs Scribe on transcription and comprehension tasks.

This combination of open access, affordability, and top-tier accuracy positions Voxtral as a compelling choice for enterprise speech intelligence.

Is Voxtral Paid or Free?

Voxtral’s core models—Small and Mini—are available for free, fully open-source under the permissive Apache 2.0 license.

Anyone can download the model weights from Hugging Face and run them locally or in the cloud at no cost. If you prefer a managed API, usage is billed at $0.001 per minute, with a simple pay-as-you-go structure.

Mistral also offers a transcription-only API endpoint to minimise cost and latency. A free trial/demo is available for testing before any charges are incurred. In short, the models are free, and the API is highly affordable.

Conclusion

Voxtral transforms voice intelligence by merging top-tier transcription accuracy, deep language comprehension, and voice-triggered actions in one open-source toolkit. Introduced by Mistral AI in July 2025, it handles long-form audio, supports multiple languages, and allows function calling.

Users can self-host at zero cost or access an economical API. Achieving leading performance on industry benchmarks and governed by clear, open licensing, Voxtral eliminates the cost barrier of proprietary solutions and the inconsistency of existing open-source options.

For organisations ready to adopt voice AI without lock-in or escalating expenses, Voxtral provides a powerful, flexible, and budget-friendly platform.

Voxtral: Open Source AI Audio Model—Capabilities, Features, and How to Access

What are the Main Components of Humanoid Robots? Check Here!

How to Turn Off Galaxy AI Features in Samsung Smartphones?

Winny

How to Turn Off Galaxy AI Features in Samsung Smartphones?

Top 13 Yield Farming Platforms in 2026: Maximize APY with Secure and Trusted Crypto Tools

What are 10 Largest AI Data Centers in the World?

[Updated] Top 13 NFT Discord Servers (Groups) to Join In 2025 with Channel Name

Best edX AI Courses and Certifications in 2024 (FREE and Paid)

Perplexity Campus Strategist Program 2024: How to Apply and Key Benefits

Gaurav Chaudhary Net Worth – Technical Guruji, Indian YouTuber

Best AI Development Platforms and Tools in 2026

How to Use Canva AI Tools and Features to Enhance Your Posts and Designs?

Best AI Model for Every Task: Image, Video, PPT and More

What is Agentic AI? Check How it Works with Real-Life Agentic AI Automation Examples

13 Best Free Online Vocal Remover AI Tools in 2026

Top 13 Yield Farming Platforms in 2026: Maximize APY with Secure and Trusted Crypto Tools

Recent News

Best AI Model for Every Task: Image, Video, PPT and More

What is Agentic AI? Check How it Works with Real-Life Agentic AI Automation Examples

13 Best Free Online Vocal Remover AI Tools in 2026

Top 13 Yield Farming Platforms in 2026: Maximize APY with Secure and Trusted Crypto Tools

Trending in AI

Browse by Category

Top Searches

Recent News

Best AI Model for Every Task: Image, Video, PPT and More

What is Agentic AI? Check How it Works with Real-Life Agentic AI Automation Examples

13 Best Free Online Vocal Remover AI Tools in 2026

Top 13 Yield Farming Platforms in 2026: Maximize APY with Secure and Trusted Crypto Tools