Voxtral: Open Source AI Audio Model—Capabilities, Features, and How to Access

Voxtral is Mistral AI’s first open-source audio AI model, launched in July 2025. It supports long-form transcription, multilingual understanding, summarisation, and speech-triggered functions. Available in two sizes—Small (24B) and Mini (3B)—it’s designed for both cloud and local use. Voxtral processes up to 40 minutes of audio and is free to download under Apache 2.0. With an API priced at just $0.001 per minute, it offers a powerful, affordable alternative to closed speech AI systems.

Mistral AI has just released Voxtral, an open-source audio AI model designed to power businesses in transcription, comprehension, summarization, and voice-driven command execution.

Launched on July 15, 2025, Voxtral comes in two configurations: Voxtral Small, featuring 24 billion parameters for cloud-scale applications, and Voxtral Mini, with 3 billion parameters for on-device and edge deployments.

Both variants handle a 32k-token context window, allowing for transcription that spans up to 30 minutes and comprehension tasks that can run for 40 minutes.

Built atop Mistral’s Small 3.1 language model, Voxtral integrates Q&A, multilingual summarization, support for languages like English, Hindi, French, Spanish, and more, and direct function binding to spoken input.

It is released under the Apache 2.0 license; users can download the model from Hugging Face or query it through Mistral’s API, with pricing that starts at $0.001 per minute, less than half the cost of comparable proprietary solutions.

What Is Voxtral AI?

Voxtral AI is an open-source audio intelligence model released by Mistral AI on July 15, 2025. It offers two versions: Voxtral Small, with 24 billion parameters, designed for production-level tasks, and Voxtral Mini, with 3 billion parameters, suitable for edge or local environments.

Both are based on the Mistral Small 3.1 LLM, combining speech processing with deep text understanding. Voxtral can handle 30 minutes of audio transcription or 40 minutes for comprehension, using a large 32K token context window.

It supports multiple languages, automatic language detection, Q&A, summarization, and even function calling directly from speech.

How Voxtral Works: Core Technology and Features

Voxtral uses a unique architecture that merges an audio encoder with a language model decoder. It reads speech using an audio encoder and adapter, transforming it into a textual representation. Then Mistral Small 3.1 processes that text to understand context, answer questions, summarise, or trigger commands.

The model’s 32K-token context window enables it to process long audio files, allowing for up to 30 minutes of transcription or 40 minutes of comprehension. Built-in Q&A and summarisation features eliminate the need to chain separate ASR and LLM models.

Voxtral supports English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian, and more, automatically detecting language. Voice-enabled function calling means spoken commands can directly trigger workflows or API calls.

Why Voxtral Stands Out: Open, Scalable, and Cost-Effective

Voxtral is a breakthrough in open-source speech AI. Licensed under Apache 2.0, businesses can self-host it with complete control.

Mistral also offers an API at a rate of only $0.001 per audio minute, which is claimed to be less than half the cost of major proprietary alternatives, such as OpenAI Whisper or ElevenLabs Scribe.

In performance benchmarks, Voxtral outperforms Whisper Large-v3, GPT-4o Mini Transcribe, Gemini 2.5 Flash, and ElevenLabs Scribe on transcription and comprehension tasks.

This combination of open access, affordability, and top-tier accuracy positions Voxtral as a compelling choice for enterprise speech intelligence.

Is Voxtral Paid or Free?

Voxtral’s core models—Small and Mini—are available for free, fully open-source under the permissive Apache 2.0 license.

Anyone can download the model weights from Hugging Face and run them locally or in the cloud at no cost. If you prefer a managed API, usage is billed at $0.001 per minute, with a simple pay-as-you-go structure.

Mistral also offers a transcription-only API endpoint to minimise cost and latency. A free trial/demo is available for testing before any charges are incurred. In short, the models are free, and the API is highly affordable.

Conclusion

Voxtral transforms voice intelligence by merging top-tier transcription accuracy, deep language comprehension, and voice-triggered actions in one open-source toolkit. Introduced by Mistral AI in July 2025, it handles long-form audio, supports multiple languages, and allows function calling.

Users can self-host at zero cost or access an economical API. Achieving leading performance on industry benchmarks and governed by clear, open licensing, Voxtral eliminates the cost barrier of proprietary solutions and the inconsistency of existing open-source options.

For organisations ready to adopt voice AI without lock-in or escalating expenses, Voxtral provides a powerful, flexible, and budget-friendly platform.

This post was last modified on July 20, 2025 12:43 pm

Winny

Winny is a fervent tech writer with a flair for simplifying complex concepts into layman’s language. Highly skilled in crafting content and translating tech jargon, she delivers articles, guides and document information to educate and empower. Get into the world of technology with the best chauffeur, bridging the gap between you and industrial science with clarity and precision.

Next How to Turn Off Galaxy AI Features in Samsung Smartphones? »

Previous « What are the Main Components of Humanoid Robots? Check Here!

Published by

Winny

July 20, 2025 12:43 pm

Crypto

Top 13 Yield Farming Platforms in 2026: Maximize APY with Secure and Trusted Crypto Tools

Explore the top 13 yield farming platforms for 2026, featuring secure, trusted, and high-APY crypto…

January 4, 2026

Top AI Learning Platforms for 2026: Master AI Skills with Coursera, edX, and Udacity

Explore the best AI learning platforms for 2026, including Coursera, edX, Udacity, and more. Learn…

January 4, 2026

Crypto

13 Best Polygon Wallets in 2026 You Need to Checkout

Explore the 13 best Polygon wallets in 2026, comparing security, DeFi access, hardware and mobile…

January 1, 2026

Voxtral: Open Source AI Audio Model—Capabilities, Features, and How to Access

What Is Voxtral AI?

How Voxtral Works: Core Technology and Features

Why Voxtral Stands Out: Open, Scalable, and Cost-Effective

Is Voxtral Paid or Free?

Conclusion

Recent Posts

Best AI Model for Every Task: Image, Video, PPT and More

What is Agentic AI? Check How it Works with Real-Life Agentic AI Automation Examples

13 Best Free Online Vocal Remover AI Tools in 2026

Top 13 Yield Farming Platforms in 2026: Maximize APY with Secure and Trusted Crypto Tools

Top AI Learning Platforms for 2026: Master AI Skills with Coursera, edX, and Udacity

13 Best Polygon Wallets in 2026 You Need to Checkout

Voxtral: Open Source AI Audio Model—Capabilities, Features, and How to Access

What Is Voxtral AI?

How Voxtral Works: Core Technology and Features

Why Voxtral Stands Out: Open, Scalable, and Cost-Effective

Is Voxtral Paid or Free?

Conclusion

Related Post

Recent Posts

Best AI Model for Every Task: Image, Video, PPT and More

What is Agentic AI? Check How it Works with Real-Life Agentic AI Automation Examples

13 Best Free Online Vocal Remover AI Tools in 2026

Top 13 Yield Farming Platforms in 2026: Maximize APY with Secure and Trusted Crypto Tools

Top AI Learning Platforms for 2026: Master AI Skills with Coursera, edX, and Udacity

13 Best Polygon Wallets in 2026 You Need to Checkout