Voxtral is Mistral AI’s first open-source audio AI model, launched in July 2025. It supports long-form transcription, multilingual understanding, summarisation, and speech-triggered functions. Available in two sizes—Small (24B) and Mini (3B)—it’s designed for both cloud and local use. Voxtral processes up to 40 minutes of audio and is free to download under Apache 2.0. With an API priced at just $0.001 per minute, it offers a powerful, affordable alternative to closed speech AI systems.

Mistral AI has just released Voxtral, an open-source audio AI model designed to power businesses in transcription, comprehension, summarization, and voice-driven command execution.
Launched on July 15, 2025, Voxtral comes in two configurations: Voxtral Small, featuring 24 billion parameters for cloud-scale applications, and Voxtral Mini, with 3 billion parameters for on-device and edge deployments.
Both variants handle a 32k-token context window, allowing for transcription that spans up to 30 minutes and comprehension tasks that can run for 40 minutes.
Built atop Mistral’s Small 3.1 language model, Voxtral integrates Q&A, multilingual summarization, support for languages like English, Hindi, French, Spanish, and more, and direct function binding to spoken input.
It is released under the Apache 2.0 license; users can download the model from Hugging Face or query it through Mistral’s API, with pricing that starts at $0.001 per minute, less than half the cost of comparable proprietary solutions.
Voxtral AI is an open-source audio intelligence model released by Mistral AI on July 15, 2025. It offers two versions: Voxtral Small, with 24 billion parameters, designed for production-level tasks, and Voxtral Mini, with 3 billion parameters, suitable for edge or local environments.
Both are based on the Mistral Small 3.1 LLM, combining speech processing with deep text understanding. Voxtral can handle 30 minutes of audio transcription or 40 minutes for comprehension, using a large 32K token context window.
It supports multiple languages, automatic language detection, Q&A, summarization, and even function calling directly from speech.
Voxtral uses a unique architecture that merges an audio encoder with a language model decoder. It reads speech using an audio encoder and adapter, transforming it into a textual representation. Then Mistral Small 3.1 processes that text to understand context, answer questions, summarise, or trigger commands.
The model’s 32K-token context window enables it to process long audio files, allowing for up to 30 minutes of transcription or 40 minutes of comprehension. Built-in Q&A and summarisation features eliminate the need to chain separate ASR and LLM models.
Voxtral supports English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian, and more, automatically detecting language. Voice-enabled function calling means spoken commands can directly trigger workflows or API calls.
Voxtral is a breakthrough in open-source speech AI. Licensed under Apache 2.0, businesses can self-host it with complete control.
Mistral also offers an API at a rate of only $0.001 per audio minute, which is claimed to be less than half the cost of major proprietary alternatives, such as OpenAI Whisper or ElevenLabs Scribe.
In performance benchmarks, Voxtral outperforms Whisper Large-v3, GPT-4o Mini Transcribe, Gemini 2.5 Flash, and ElevenLabs Scribe on transcription and comprehension tasks.
This combination of open access, affordability, and top-tier accuracy positions Voxtral as a compelling choice for enterprise speech intelligence.
Voxtral’s core models—Small and Mini—are available for free, fully open-source under the permissive Apache 2.0 license.
Anyone can download the model weights from Hugging Face and run them locally or in the cloud at no cost. If you prefer a managed API, usage is billed at $0.001 per minute, with a simple pay-as-you-go structure.
Mistral also offers a transcription-only API endpoint to minimise cost and latency. A free trial/demo is available for testing before any charges are incurred. In short, the models are free, and the API is highly affordable.
Voxtral transforms voice intelligence by merging top-tier transcription accuracy, deep language comprehension, and voice-triggered actions in one open-source toolkit. Introduced by Mistral AI in July 2025, it handles long-form audio, supports multiple languages, and allows function calling.
Users can self-host at zero cost or access an economical API. Achieving leading performance on industry benchmarks and governed by clear, open licensing, Voxtral eliminates the cost barrier of proprietary solutions and the inconsistency of existing open-source options.
For organisations ready to adopt voice AI without lock-in or escalating expenses, Voxtral provides a powerful, flexible, and budget-friendly platform.
This post was last modified on July 20, 2025 12:43 pm
Pick your task, get the best AI model for it — images, video, slides, research,…
Learn what Agentic AI is, how it works, and how it differs from Generative AI.…
Discover the 13 best free online vocal remover AI tools for 2026, designed to isolate…
Explore the top 13 yield farming platforms for 2026, featuring secure, trusted, and high-APY crypto…
Explore the best AI learning platforms for 2026, including Coursera, edX, Udacity, and more. Learn…
Explore the 13 best Polygon wallets in 2026, comparing security, DeFi access, hardware and mobile…