Mistral AI, the customer-driven platform, has recently launched two new research models. These two models are Codestral Mamba and Mathstral. The mission is to make frontier AI ubiquitous and provide all builders with tailor-made AI. This requires fierce independence, a strong commitment to open, portable, and customisable solutions, and an extreme focus on shipping the most advanced technology in a limited time.
Read this article to know all about the two new research models of Mistral AI, Codestral Mamba and Mathstral.
What are the Codestral and Mathstral?
Codestral Mamba: Codestral Mamba is a tribute to Cleopatra. The Mamba2 language model specialises in code generation and is available under an Apache 2.0 licence. It is available for free use, modification, and distribution and aims to open new perspectives in architecture research. The model is trained with advanced code and reasoning capabilities, enabling it to perform on par with SOTA transformer-based models. Also, it is tested on in-context retrieval capabilities up to 256k tokens.
Mamba models have the advantage of linear time inference and can theoretically model infinite-length sequences, unlike the transformer models. It allows users to engage with the model extensively with quick responses, irrespective of the input length. This efficiency is particularly important for use cases involving code productivity.
Mathstral: Mathstral As a tribute to Archimedes and his 2311th anniversary. It is a specific 7B model designed for math reasoning and scientific discovery. The model has a 32k context window published under the Apache 2.0 licence. Mathstral stands on the shoulders of Mistral 7B and specialises in STEM subjects. It achieves state-of-the-art reasoning capacities in its size category across various industry-standard benchmarks. In particular, it achieves 56.6% on MATH and 63.47% on MMLU, with the following MMLU performance difference by subject between Mathstral 7B and Mistral 7B.
Mathstral is another example of the excellent performance/speed tradeoffs achieved when building models for specific purposes, particularly with its new fine-tuning capabilities.
How do Codestral and Mathstral work?
Mathstral is an instruction model, one can use it to fine-tune it as such, referring to our documentation. It can achieve significantly better results with more inference-time computation: Mathstral 7B scores 68.37% on MATH with majority voting and 74.59% with a strong reward model among 64 candidates.
Codestral Mamba can be deployed using the mistral-inference SDK, which relies on reference implementations from Mamba’s GitHub repository. The model can also be deployed through TensorRT-LLM.
In conclusion, Codestral and Mathstral offer distinct but complementary approaches to problem-solving within their respective domains. Codestral excels in programming and software development, providing robust tools and frameworks that streamline coding processes and enhance productivity. On the other hand, Mathstral shines in mathematics and data analysis, delivering powerful algorithms and computational techniques that facilitate precise and insightful data interpretations. Together, these platforms empower users to tackle complex challenges with efficiency and accuracy, making them invaluable assets in today’s technologically driven world.