Just a day ago, Meta released its latest large language model, Llama 3.1 405B, calling it the “largest-ever open-source artificial intelligence (AI) model.” Not a day passed when Paris-based AI startup, Mistral announced the release of Mistral Large 2, the latest iteration of its flagship model. Mistral Large 2 outperforms its predecessor in code generation, mathematics, reasoning, multilingual support, and function calling.
Mistral claims Large 2 “performs on par with leading models such as GPT-4o, Claude 3, and Llama 3 405B.” The company says that the major part of the training process was to minimize the model’s inclination for “hallucination,” where it might generate answers that sound believable but are wrong or unrelated. This problem was solved by adjusting and making the model more careful and perceiving in its replies, so it gives dependable and precise results.
Llama 3.1 vs GPT 4 vs Mixtral 8x22B vs Claude 3.5: Which is Best LLM Model?
Key Features
Here are some of the prominent features of the newly released Mistral Large 2:
- Context Window: Mistral Large 2 has an impressive 128k token context window, which allows it to handle complex and lengthy interactions.
- Multilingual Support: The model supports a large number of languages, including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean.
- Coding Languages: Mistral Large 2 is proficient in more than 80 coding languages, including Python, Java, C, C++, JavaScript, and Bash, significantly increasing its utility for software development and engineering tasks.
- Single-Node Inference: Mistral Large 2’s 123 billion parameters are designed with a single-node inference capability. These ensure high throughput and efficiency for large AI applications.
Performance and Benchmarks
Mistral Large 2 claims to establish new standards across multiple performance evaluations. On the MMLU benchmark, the model with the pre-training phase attains an accuracy of 84.0%, exhibiting its superiority in terms of performance/cost efficiency.
The capacity of this model for code generation, reasoning, and multilingual tasks has been thoroughly tested and put on benchmarks. It surpasses its predecessors as well as competitive models such as GPT-4o, Claude 3 Opus, and Llama 3 405B.
For code generation and reasoning, Mistral Large 2 is the best choice because it has fewer hallucinations and gives precise outputs with high reliability. When we put it to work on popular mathematical benchmarks, we see that its performance in solving problems is improved as well. The detailed benchmarks show how good this model is at following instructions and holding conversations which makes it perfect for dealing with complex discussions having many turns.
Meta AI vs ChatGPT: Which One is Better and Best?
Performance accuracy on code generation benchmarks | Source: Mistral AI
Model | Average | Python | C++ | Bash | Java | TypeScript | PHP | C# |
Mistral Large 2 (2407) | 76.9% | 92.1% | 84.5% | 51.9% | 84.2% | 86.8% | 77.6% | 61.4% |
Mistral Large 1 (2402) | 60.4% | 70.1% | 67.1% | 36.1% | 70.3% | 71.7% | 61.5% | 46.2% |
Llama 3.1 405B (measured) | 74.9% | 84.1% | 82.0% | 58.2% | 82.9% | 83.6% | 73.9% | 59.5% |
Llama 3.1 405B (paper) | 75.8% | 89.0% | 82.0% | 57.6% | 80.4% | 81.1% | 76.4% | 64.4% |
Llama 3.1 70B | 68.5% | 78.7% | 70.2% | 51.3% | 74.7% | 76.7% | 73.3% | 54.4% |
GPT-4o | 77.9% | 93.3% | 85.7% | 54.4% | 82.9% | 89.3% | 79.5% | 60.1% |
Performance accuracy on GSM8K | Source: Mistral AI
Here are some of the key takeaways regarding the model’s performance and benchmarks:
- Accuracy: It achieved 84.0% on MMLU.
- Code Generation: Outperforms previous Mistral Large and matches leading models like GPT-4o, Claude 3 Opus, and Llama 3 405B.
- Reasoning: Enhanced to minimize “hallucinations” and provide accurate outputs. Improved performance on mathematical benchmarks like GSM8K and MATH.
- Instruction Following & Alignment: Better at following precise instructions and handling multi-turn conversations. Performance is high on benchmarks like MT-Bench, Wild Bench, and Arena Hard.
- Language Diversity: Excels in English, French, German, Spanish, Italian, Portuguese, Dutch, Russian, Chinese, Japanese, Korean, Arabic, and Hindi.
Pricing
Mistral Large 2 is available under different licensing options. There is a Mistral Research License for study and non-commercial use and a Mistral Commercial License which you can buy for business purposes. However, you have to contact Mistral AI to purchase the latter.
This two-way licensing model ensures that academic researchers and commercial businesses can use the LLM.
Mixtral 8x22B vs 8x7B vs Mistral 7B: Which one is better? Check Here!
Accessibility
Mistral Large 2 is available on la Plateforme, under the API name mistral-large-2407. You can test the model on le Chat, and the instruct model weights are hosted on Hugging Face.
Mistral AI is in collaboration with major cloud service providers like Google Cloud Platform, Azure AI Studio from Microsoft Corporation; Amazon Bedrock powered by Amazon Web Services (AWS); IBM watsonx.ai owned by International Business Machines Corporation (IBM), which makes Mistral Large 2 broadly available for deployment or fine-tuning purposes.
The Bottom Line
Though Mistral AI has been in the AI game for such a short time, compared to other giants like OpenAI, Meta, and Google, its products have already shown great promise and potential. Its newly released Mistral Large 2 is setting new standards in performance, cost-efficiency, and versatility for large language models.