Google is allegedly using Claude from Anthropic to enhance the output from its own AI model, Gemini.
In response to a user prompt, Gemini and Claude generate responses displayed to contractors employed by the IT giant. According to a TechCrunch report, they then have 30 minutes to evaluate each model’s output based on criteria like verbosity and truthfulness.
Google’s contractors use an internal platform to compare Gemini’s results with those of other AI models. Recently, they began detecting allusions such as “I am Claude, created by Anthropic” in a handful of the outputs displayed to them.
Also Read: Microsoft Loses to Google in Windows Revenue, Says Satya Nadella
The contractors internally addressed how, in comparison to other AI models, such as Gemini, “Claude’s safety settings are the strictest” based on their evaluation. According to the study, Gemini declared the inputs to be a “huge safety violation” for containing “nudity and bondage,” whereas Claude chose not to respond when they submitted risky suggestions.
Usually, IT businesses use industry standards to assess how well their AI models perform. According to its terms of service, Anthropic customers are prohibited from using Claude “to build a competing product or service” or “train competing AI models” without the Google-backed startup’s consent.
Shira McNamara, a spokesman for Google Deepmind, stated that comparing model outputs for evaluations was consistent with industry standards, though it is unclear if this restriction also applies to investors. However, McNamara was cited as stating, “Any suggestion that we have used Anthropic models to train Gemini is inaccurate.”
Also Read: Google DeepMind’s latest Veo 2 AI video creator outperforms OpenAI’s Sora
When Google was asked if it has permission to utilize Claude for testing against Gemini, the company declined to answer.
The performance of AI models is frequently compared to competitors as tech companies compete to develop better models. Usually, this is done by putting their models through industry benchmarks rather than having contractors laboriously assess their rivals’ AI replies.