AI

How Is Meta Llama 3 Better Than Claude 3 Sonnet & Gemini Pro 1.5? Check Here

Meta's Llama 3 benchmark offers a suite for evaluating Meta AI's performance in comparison to other existing AI platforms. Read this article to compare Llama 3’s strengths and weaknesses against other LLMs to understand its capabilities.

Meta is in the news because of its recent launch, Llama 3. Llama 3 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. It is intended for commercial and research use in English. Also, the instruction-tuned models are intended for assistant-like chat, whereas pre-trained models can be adapted for a variety of natural language generation tasks.

What is Llama 3? Check Meta AI Open LLM Performance, Benchmarks, Price and Other Details

Read this article to understand how Meta Llama 3 surpasses the benchmark of Claude 3 Sonnet & Gemini Pro 1.5. 

Comparison of Large Language Models (LLMs)

FeatureLlama 3 (70B)Claude 3 SonnetGemini 1.5 Pro
DeveloperMetaAnthropicGoogle AI
Release DateApril 2024Not publicly available (limited access)Not publicly available (limited access)
Parameters70 BillionNot specified (smaller than Opus)137B
Open SourceYesNoNo
Strongest BenchmarksMMLU, HumanEval, and GSM-8KNeedle in a Haystack (NIAH) with a large context windowMATH
Weaker BenchmarksMATH (compared to Gemini 1.5 Pro)MMLU, GPQA, HumanEval, and GSM-8KNot publicly available
Multimodal Capabilities (text & image)No (text-only currently)No (text-only currently)No (text-only currently)
AvailabilityResearch AccessLimited AccessLimited Access

NOTE: All three models are still under development, along with the benchmarks, so these results may change over time.

How is Meta Llama 3 better than Claude 3 Sonnet and Gemini Pro 1.5?

Meta developed and released the Meta Llama 3 family of large language models (LLMs) in 8 and 70B sizes. The Llama 3 model is optimized for dialogue use cases and outperforms many of the available open-source chat models on common industry benchmarks. In particular, the Llama 3 70B model surpasses closed models like Gemini Pro 1.5 and Claude Sonnet across benchmarks. These tasks include question-answering, summarizing, following instructions, and few-shot learning. 

First Evaluation

In the official blog post, Meta claims both sizes of Llama 3 beat similarly sized models like Google’s Gemma and Gemini, Mistral 7B, and Anthropic’s Claude 3 in certain benchmarking tests. In the MMLU benchmark, which typically measures general knowledge, the latest LLM model performed significantly better than both Gemma 7B and Mistral 7B, while Llama 3 70B slightly edged Gemini Pro 1.5.

Second Evaluation

According to Meta, Llama 3 was given a higher rating by human evaluators than OpenAI’s GPT-3.5 and other models. It produced a new dataset that human evaluators could use to highlight the distinctions and difficulties between OpenAI’s GPT 3.5, Llama 3, and other AI models currently in use. “This evaluation set contains 1,800 prompts that cover 12 key use cases: asking for advice, brainstorming, classification, closed question answering, coding, creative writing, extraction, inhabiting a character/persona, open question answering, reasoning, rewriting, and summarization,” Meta says in its blog post. 

Third Evaluation

The last evaluation is based on the pre-trained model, which establishes a new state-of-the-art for LLM models at those scales.

Larger model sizes and more multimodal responses, such as ‘Generate an image’ or ‘Transcribe an audio file’, are the main features of Llama 3. This big model, with over 400 B parameters, can process more intricate patterns than the smaller models. According to Meta, these larger versions are presently undergoing training, but preliminary performance evaluations indicate that these models can address a significant number of the benchmarking questions. 

Rabbit R1 vs Humane AI Pin vs. Limitless Pendant: Which AI Wearable Device is Better?

This post was last modified on April 22, 2024 4:52 pm

Winny

Winny is a fervent tech writer with a flair for simplifying complex concepts into layman’s language. Highly skilled in crafting content and translating tech jargon, she delivers articles, guides and document information to educate and empower. Get into the world of technology with the best chauffeur, bridging the gap between you and industrial science with clarity and precision.

Recent Posts

Perplexity AI Voice Assistant: How to Use and Benefits for iOS and Android Phones

Perplexity AI Voice Assistant is a smart tool for Android devices that lets users perform…

May 10, 2025

Meta AI App: How to Download? Check Its Key Features and Benefits

Meta AI is a personal voice assistant app powered by Llama 4. It offers smart,…

May 10, 2025

AI in U.S. Education for American Youth by President DONALD TRUMP

On April 23, 2025, current President Donald J. Trump signed an executive order to advance…

May 10, 2025

Google is moving Android news to a virtual event before I/O

Google is launching The Android Show: I/O Edition, featuring Android ecosystem president Sameer Samat, to…

April 29, 2025

Top Generative AI Companies of the World 2025

The top 11 generative AI companies in the world are listed below. These companies have…

April 28, 2025

Veo 2 extends access to more Gemini Advanced Users

Google has integrated Veo 2 video generation into the Gemini app for Advanced subscribers, enabling…

April 25, 2025