How Is Meta Llama 3 Better Than Claude 3 Sonnet & Gemini Pro 1.5? Check Here

Meta's Llama 3 benchmark offers a suite for evaluating Meta AI's performance in comparison to other existing AI platforms. Read this article to compare Llama 3’s strengths and weaknesses against other LLMs to understand its capabilities.

Meta is in the news because of its recent launch, Llama 3. Llama 3 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. It is intended for commercial and research use in English. Also, the instruction-tuned models are intended for assistant-like chat, whereas pre-trained models can be adapted for a variety of natural language generation tasks.

What is Llama 3? Check Meta AI Open LLM Performance, Benchmarks, Price and Other Details

Read this article to understand how Meta Llama 3 surpasses the benchmark of Claude 3 Sonnet & Gemini Pro 1.5.

Comparison of Large Language Models (LLMs)

Feature	Llama 3 (70B)	Claude 3 Sonnet	Gemini 1.5 Pro
Developer	Meta	Anthropic	Google AI

Release Date	April 2024	Not publicly available (limited access)	Not publicly available (limited access)
Parameters	70 Billion	Not specified (smaller than Opus)	137B
Open Source	Yes	No	No
Strongest Benchmarks	MMLU, HumanEval, and GSM-8K	Needle in a Haystack (NIAH) with a large context window	MATH
Weaker Benchmarks	MATH (compared to Gemini 1.5 Pro)	MMLU, GPQA, HumanEval, and GSM-8K	Not publicly available
Multimodal Capabilities (text & image)	No (text-only currently)	No (text-only currently)	No (text-only currently)
Availability	Research Access	Limited Access	Limited Access

NOTE: All three models are still under development, along with the benchmarks, so these results may change over time.

How is Meta Llama 3 better than Claude 3 Sonnet and Gemini Pro 1.5?

Meta developed and released the Meta Llama 3 family of large language models (LLMs) in 8 and 70B sizes. The Llama 3 model is optimized for dialogue use cases and outperforms many of the available open-source chat models on common industry benchmarks. In particular, the Llama 3 70B model surpasses closed models like Gemini Pro 1.5 and Claude Sonnet across benchmarks. These tasks include question-answering, summarizing, following instructions, and few-shot learning.

First Evaluation

In the official blog post, Meta claims both sizes of Llama 3 beat similarly sized models like Google’s Gemma and Gemini, Mistral 7B, and Anthropic’s Claude 3 in certain benchmarking tests. In the MMLU benchmark, which typically measures general knowledge, the latest LLM model performed significantly better than both Gemma 7B and Mistral 7B, while Llama 3 70B slightly edged Gemini Pro 1.5.

Second Evaluation

According to Meta, Llama 3 was given a higher rating by human evaluators than OpenAI’s GPT-3.5 and other models. It produced a new dataset that human evaluators could use to highlight the distinctions and difficulties between OpenAI’s GPT 3.5, Llama 3, and other AI models currently in use. “This evaluation set contains 1,800 prompts that cover 12 key use cases: asking for advice, brainstorming, classification, closed question answering, coding, creative writing, extraction, inhabiting a character/persona, open question answering, reasoning, rewriting, and summarization,” Meta says in its blog post.

Third Evaluation

The last evaluation is based on the pre-trained model, which establishes a new state-of-the-art for LLM models at those scales.

Larger model sizes and more multimodal responses, such as ‘Generate an image’ or ‘Transcribe an audio file’, are the main features of Llama 3. This big model, with over 400 B parameters, can process more intricate patterns than the smaller models. According to Meta, these larger versions are presently undergoing training, but preliminary performance evaluations indicate that these models can address a significant number of the benchmarking questions.

Rabbit R1 vs Humane AI Pin vs. Limitless Pendant: Which AI Wearable Device is Better?

This post was last modified on April 22, 2024 4:52 pm

Winny

Winny is a fervent tech writer with a flair for simplifying complex concepts into layman’s language. Highly skilled in crafting content and translating tech jargon, she delivers articles, guides and document information to educate and empower. Get into the world of technology with the best chauffeur, bridging the gap between you and industrial science with clarity and precision.