Mistral AI has just released Voxtral, an open-source audio AI model designed to power businesses in transcription, comprehension, summarization, and voice-driven command execution.
Launched on July 15, 2025, Voxtral comes in two configurations: Voxtral Small, featuring 24 billion parameters for cloud-scale applications, and Voxtral Mini, with 3 billion parameters for on-device and edge deployments.
Both variants handle a 32k-token context window, allowing for transcription that spans up to 30 minutes and comprehension tasks that can run for 40 minutes.
Built atop Mistral’s Small 3.1 language model, Voxtral integrates Q&A, multilingual summarization, support for languages like English, Hindi, French, Spanish, and more, and direct function binding to spoken input.
It is released under the Apache 2.0 license; users can download the model from Hugging Face or query it through Mistral’s API, with pricing that starts at $0.001 per minute, less than half the cost of comparable proprietary solutions.
What Is Voxtral AI?
Voxtral AI is an open-source audio intelligence model released by Mistral AI on July 15, 2025. It offers two versions: Voxtral Small, with 24 billion parameters, designed for production-level tasks, and Voxtral Mini, with 3 billion parameters, suitable for edge or local environments.
Both are based on the Mistral Small 3.1 LLM, combining speech processing with deep text understanding. Voxtral can handle 30 minutes of audio transcription or 40 minutes for comprehension, using a large 32K token context window.
It supports multiple languages, automatic language detection, Q&A, summarization, and even function calling directly from speech.
How Voxtral Works: Core Technology and Features
Voxtral uses a unique architecture that merges an audio encoder with a language model decoder. It reads speech using an audio encoder and adapter, transforming it into a textual representation. Then Mistral Small 3.1 processes that text to understand context, answer questions, summarise, or trigger commands.
The model’s 32K-token context window enables it to process long audio files, allowing for up to 30 minutes of transcription or 40 minutes of comprehension. Built-in Q&A and summarisation features eliminate the need to chain separate ASR and LLM models.
Voxtral supports English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian, and more, automatically detecting language. Voice-enabled function calling means spoken commands can directly trigger workflows or API calls.
Why Voxtral Stands Out: Open, Scalable, and Cost-Effective
Voxtral is a breakthrough in open-source speech AI. Licensed under Apache 2.0, businesses can self-host it with complete control.
Mistral also offers an API at a rate of only $0.001 per audio minute, which is claimed to be less than half the cost of major proprietary alternatives, such as OpenAI Whisper or ElevenLabs Scribe.
In performance benchmarks, Voxtral outperforms Whisper Large-v3, GPT-4o Mini Transcribe, Gemini 2.5 Flash, and ElevenLabs Scribe on transcription and comprehension tasks.
This combination of open access, affordability, and top-tier accuracy positions Voxtral as a compelling choice for enterprise speech intelligence.
Is Voxtral Paid or Free?
Voxtral’s core models—Small and Mini—are available for free, fully open-source under the permissive Apache 2.0 license.
Anyone can download the model weights from Hugging Face and run them locally or in the cloud at no cost. If you prefer a managed API, usage is billed at $0.001 per minute, with a simple pay-as-you-go structure.
Mistral also offers a transcription-only API endpoint to minimise cost and latency. A free trial/demo is available for testing before any charges are incurred. In short, the models are free, and the API is highly affordable.
Conclusion
Voxtral transforms voice intelligence by merging top-tier transcription accuracy, deep language comprehension, and voice-triggered actions in one open-source toolkit. Introduced by Mistral AI in July 2025, it handles long-form audio, supports multiple languages, and allows function calling.
Users can self-host at zero cost or access an economical API. Achieving leading performance on industry benchmarks and governed by clear, open licensing, Voxtral eliminates the cost barrier of proprietary solutions and the inconsistency of existing open-source options.
For organisations ready to adopt voice AI without lock-in or escalating expenses, Voxtral provides a powerful, flexible, and budget-friendly platform.













