The world of large language models (LLMs) just experienced a significant shakeup with Mistral’s unconventional release of its latest model, the 8X22B. This massive, open-source LLM boasts a larger parameter size than its predecessor, the 8X7B, and is narrowing the performance gap with closed-source models from tech giants like OpenAI and Google, according to early benchmarks.
The 8X22B model simply appeared on the company’s official X (formerly Twitter) account as a downloadable torrent magnet link. This unconventional release method, while lacking fanfare, makes the model readily accessible to anyone with the necessary resources.
Mistral and Microsoft accelerate AI innovation with the launch of Mistral Large on Azure
Before delving into benchmarks, it is essential to understand the different types of LLMs.
The Mistral 8X22B model falls under the autocomplete category, which excels at completing sentences based on the given prompt. Other types include instruct models like Meta’s Code LLaMA, designed to follow developer instructions and chat models like OpenAI’s ChatGPT and Google’s Gemini AI, adept at natural language understanding and responding to contextual queries conversationally.
Although Mistral has not released official benchmarks, the Hugging Face community has stepped in to evaluate the model’s performance. Early benchmark scores posted by the community indicate substantial improvements over Mistral’s previous models. In the Hellaswag benchmark, the 8X22B scored an impressive 88.9, placing it close behind industry-leading models like GPT-4 (95.3), Claude 3 Opus (95.4), and Gemini 1.5 Pro (92.5). This performance surpasses established names like GPT-3.5 (85.5) and Gemini 1.0 Ultra (87.8).
Arthur Mensch Net Worth: Mistral AI CEO and Co-Founder
Let’s take a look at how the newer model fares in comparison to the older ones.
Mixtral 8x22B vs 8x7B vs Mistral 7B
As Mistral AI holds some details close to the chest, let’s look at the comparison of their intriguing language models: Mixtral 8x22B, 8x7B, and Mistral 7B.
- Parameter Size: A language model’s parameter size refers to the number of trainable variables within it and often correlates with performance.
- Mixtral 8x22B: With its substantial parameter size, likely the largest among the models at potentially 22 billion parameters.
- Mixtral 8x7B: May feature an intermediate parameter size, likely around 7 billion.
- Mistral 7B: The smallest of the three, potentially with approximately 7 billion parameters but lacking the “8x” prefix.
Mistral 7B Tutorial: A Step-by-Step Guide on How to Use Mistral LLM
- Architecture: While Mistral AI has yet to reveal specific architectures, we know the company offers open-weight models, allowing for customization. Additionally, the Mixtral 8x22B boasts a confirmed Sparse Mixture-of-Experts (SMoE) architecture. SMoE designs boost efficiency by employing a subset of experts for each token processed.
- Mixtral 8x22B: Confirmed SMoE architecture ensures potential efficiency gains.
- Mixtral 8x7B: This model may or may not utilize SMoE architecture.
- Mistral 7B: The architecture remains unknown, leaving room for speculation.
- Performance Potential: With no official benchmarks, it is challenging to pinpoint precise performance metrics.
- Mixtral 8x22B: Its colossal parameter size and SMoE design potentially position this model as the top performer among the three.
- Mixtral 8x7B: Offers a potential balance of performance and efficiency if utilizing SMoE architecture, but could fall behind the 8x22B in raw power.
- Mistral 7B: Suitable for tasks demanding less computational resources, though it might struggle with complex tasks compared to its larger counterparts.
Mistral 7B Outperforms LLaMA 2 and GPT-3.5 by running 6x faster
- Choosing Your Champion: As open-source models, the cost does not factor into the decision. Here are the key factors to consider when selecting a model:
- Task Requirements: If your tasks involve computationally intensive processes like text generation or translation, larger models like the 8x22B could provide better performance.
- Computational Resources: Running massive models like the 8x22B requires considerable computing power (GPUs or TPUs). If resources are limited, the 8x7B (if it uses SMoE) or the Mistral 7B may be more practical.
- Customizability Needs: All three models are open-weight, allowing for fine-tuning to meet specific task requirements.
The Bottom Line
Mistral’s unconventional release of the 8X22B model has generated excitement within the LLM community. Its strong performance in early benchmarks, combined with its full open-source nature, challenges established players in the industry. This development could lead to faster innovation and more democratized access to powerful AI tools. As the LLM landscape continues to evolve, it remains to be seen how Mistral’s commitment to open-source practices and its focus on different LLM types like autocomplete models will shape the future of this rapidly developing field.
Grok 1.5 vs Mistral 8x22B vs Claude vs GPT-4 vs Gemini: What are the Benchmark Differences?