Microsoft’s Machine Learning Foundations team unveiled Phi-2, the latest addition to their suite of small language models (SLMs). While Phi-1 and Phi-1.5 showcased remarkable achievements, Phi-2, with 2.7 billion parameters, stands out by surpassing models 25 times its size in reasoning and language understanding.
Phi-2 (opens in a new tab), a 2.7 billion-parameter language model that demonstrates outstanding reasoning and language understanding capabilities, showcasing state-of-the-art performance among base language models with less than 13 billion parameters. On comparable benchmarks, Phi-2 matches or outperforms models up to 25x larger, thanks to new innovations in model scaling and training data curation.
What are the key insights behind Phi-2?
Phi 2 models aims to answer this question by training SLMs that achieve performance on par with models of much higher scale (yet still far from the frontier models).innovative techniques to scale up, starting from our 1.3 billion parameter model, Phi-1.5, and embedding its knowledge within the 2.7 billion parameter Phi-2. This model has scaled knowledge transfer not only accelerates training convergence but shows clear boost in Phi-2 benchmark scores.
The key to Phi-2’s success lies in two main insights. Firstly, the team emphasises the critical role of training data quality, focusing on “textbook-quality” data and synthetic datasets tailored to teach the model common-sense reasoning and general knowledge. Secondly, innovative scaling techniques, building on the knowledge embedded in the previous 1.3 billion parameter model Phi-1.5, contribute to Phi-2’s outstanding performance.
Phi-2, a Transformer-based model with a next-word prediction objective, The training for Phi-2 took 14 days on 96 A100 GPUs, utilising 1.4 trillion tokens from synthetic and web datasets. Notably, it has not undergone alignment through reinforcement learning from human feedback or instruct fine-tuning, yet it exhibits better behaviour regarding toxicity and bias compared to existing models that went through such processes.
In terms of benchmarks, Phi-2 excels in various categories, including commonsense reasoning, language understanding, math, and coding. With only 2.7 billion parameters, Phi-2 surpasses the performance of the Mistral and Llama-2 models at 7B and 13B parameters on various aggregated benchmarks. Impressively, the Phi-2 matches or outperforms the larger Google Gemini Nano 2 model in certain tasks.
Acknowledging challenges in model evaluation, Microsoft underscores the importance of testing on concrete use cases. Internal proprietary datasets and tasks at Microsoft reaffirm Phi-2’s superiority over Mistral-7B, which, in turn, outperforms Llama-2 models.
Beyond benchmarks, extensive testing on research community prompts aligns with expectations set by benchmark results. For instance, Phi-2 demonstrates prowess in solving physics problems, showcasing its versatility.
Microsoft has made Phi-2 available in the Azure AI Machine Learning Studio, encouraging researchers to explore its potential for mechanistic interpretability, safety improvements, and fine-tuning experiments. The release of Phi-2 represents a significant stride in demonstrating that superior language model capabilities can be achieved at a smaller scale through strategic training choices and data curation.
Related Web Stories: