Considerable research has been done to enhance and expand training methods for vision-language models (VLMs). However, because there are an increasing number of benchmarks, researchers are faced with the difficult challenge of putting each technique into practice, which comes with a significant computational cost and figuring out how all of these benchmarks relate to useful advancement axes.
Meta introduces UniBench, a single implementation of over 50 VLM benchmarks covering a wide range of meticulously categorized skills, from object identification to spatial awareness, counting, and much more, to enable a methodical evaluation of VLM development.
Researchers at Meta evaluate about 60 publicly available vision-language models that were trained on scales of up to 12.8 billion data points to demonstrate the usefulness of UniBench for tracking advancement.
Also Read: How to use Meta AI to create Cool GIFs on WhatsApp (Easy Steps)?
The Meta AI Research team discovers that although increasing the size of the model or training data can improve many of the capabilities of vision-language models, scaling has minimal effect on relationships or reasoning. Surprisingly, they also find that far simpler networks can tackle simple digit identification and counting tasks like MNIST, which the top VLMs available today struggle with.
Researchers discover that more focused interventions, such as data quality or customized learning objectives, hold greater promise in situations where scale is insufficient. Meta researchers can also provide practitioners with advice on how to choose the best VLM for a particular application.
At last, Meta AI released the UniBench codebase, which is simple to use and contains all 50+ benchmarks and comparisons across 59 models. Additionally, it includes a streamlined, representative set of benchmarks that can be completed in 5 minutes on a single GPU.
Also Read: WhatsApp will use Meta AI to enable real-time audio talks
By doing this, they expose the boundaries of reasoning and relational scale, the potential of high-quality data, customized learning goals, and recommendations that VLM practitioners should employ. By preventing blind spots in VLM evaluations, Meta believes UniBench helps researchers assess progress thoroughly and effectively.