AI

How Is Meta Llama 3 Better Than Claude 3 Sonnet & Gemini Pro 1.5? Check Here

Meta's Llama 3 benchmark offers a suite for evaluating Meta AI's performance in comparison to other existing AI platforms. Read this article to compare Llama 3’s strengths and weaknesses against other LLMs to understand its capabilities.

Meta is in the news because of its recent launch, Llama 3. Llama 3 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. It is intended for commercial and research use in English. Also, the instruction-tuned models are intended for assistant-like chat, whereas pre-trained models can be adapted for a variety of natural language generation tasks.

What is Llama 3? Check Meta AI Open LLM Performance, Benchmarks, Price and Other Details

Read this article to understand how Meta Llama 3 surpasses the benchmark of Claude 3 Sonnet & Gemini Pro 1.5. 

Comparison of Large Language Models (LLMs)

FeatureLlama 3 (70B)Claude 3 SonnetGemini 1.5 Pro
DeveloperMetaAnthropicGoogle AI
Release DateApril 2024Not publicly available (limited access)Not publicly available (limited access)
Parameters70 BillionNot specified (smaller than Opus)137B
Open SourceYesNoNo
Strongest BenchmarksMMLU, HumanEval, and GSM-8KNeedle in a Haystack (NIAH) with a large context windowMATH
Weaker BenchmarksMATH (compared to Gemini 1.5 Pro)MMLU, GPQA, HumanEval, and GSM-8KNot publicly available
Multimodal Capabilities (text & image)No (text-only currently)No (text-only currently)No (text-only currently)
AvailabilityResearch AccessLimited AccessLimited Access

NOTE: All three models are still under development, along with the benchmarks, so these results may change over time.

How is Meta Llama 3 better than Claude 3 Sonnet and Gemini Pro 1.5?

Meta developed and released the Meta Llama 3 family of large language models (LLMs) in 8 and 70B sizes. The Llama 3 model is optimized for dialogue use cases and outperforms many of the available open-source chat models on common industry benchmarks. In particular, the Llama 3 70B model surpasses closed models like Gemini Pro 1.5 and Claude Sonnet across benchmarks. These tasks include question-answering, summarizing, following instructions, and few-shot learning. 

First Evaluation

In the official blog post, Meta claims both sizes of Llama 3 beat similarly sized models like Google’s Gemma and Gemini, Mistral 7B, and Anthropic’s Claude 3 in certain benchmarking tests. In the MMLU benchmark, which typically measures general knowledge, the latest LLM model performed significantly better than both Gemma 7B and Mistral 7B, while Llama 3 70B slightly edged Gemini Pro 1.5.

Second Evaluation

According to Meta, Llama 3 was given a higher rating by human evaluators than OpenAI’s GPT-3.5 and other models. It produced a new dataset that human evaluators could use to highlight the distinctions and difficulties between OpenAI’s GPT 3.5, Llama 3, and other AI models currently in use. “This evaluation set contains 1,800 prompts that cover 12 key use cases: asking for advice, brainstorming, classification, closed question answering, coding, creative writing, extraction, inhabiting a character/persona, open question answering, reasoning, rewriting, and summarization,” Meta says in its blog post. 

Third Evaluation

The last evaluation is based on the pre-trained model, which establishes a new state-of-the-art for LLM models at those scales.

Larger model sizes and more multimodal responses, such as ‘Generate an image’ or ‘Transcribe an audio file’, are the main features of Llama 3. This big model, with over 400 B parameters, can process more intricate patterns than the smaller models. According to Meta, these larger versions are presently undergoing training, but preliminary performance evaluations indicate that these models can address a significant number of the benchmarking questions. 

Rabbit R1 vs Humane AI Pin vs. Limitless Pendant: Which AI Wearable Device is Better?

This post was last modified on April 22, 2024 4:52 pm

Winny

Winny is a fervent tech writer with a flair for simplifying complex concepts into layman’s language. Highly skilled in crafting content and translating tech jargon, she delivers articles, guides and document information to educate and empower. Get into the world of technology with the best chauffeur, bridging the gap between you and industrial science with clarity and precision.

Recent Posts

Rish Gupta Net Worth: CEO & Co-Founder of Spot AI

Rish Gupta is an Indian entrepreneur who serves as the chief executive officer (CEO) of…

April 19, 2025

Top 10 Robotics Skills Required for Engineering Career Growth

Are you looking to advance your engineering career in the field of robotics? Check out…

April 18, 2025

Top 20 Books on AI in 2025: The Ultimate Reading List on Artificial Intelligence

Artificial intelligence is a topic that has recently made internet users all over the world…

April 18, 2025

Top 10 Best AI Communities in 2025

Boost your learning journey with the power of AI communities. The article below highlights the…

April 18, 2025

Artificial Intelligence (AI) Glossary and Terminologies – Complete Cheat Sheet List

Demystify the world of Artificial Intelligence with our comprehensive AI Glossary and Terminologies Cheat Sheet.…

April 18, 2025

Scott Wu Net Worth: Devin AI Software Engineer, CEO of Cognition Labs

Scott Wu is the co-founder and Chief Executive Officer of Cognition Labs, an artificial intelligence…

April 17, 2025