OpenAI's latest API enhancements, revealed during Developer Day, include new tools for speech applications, prompt caching, model distillation, and vision fine-tuning. These upgrades will help developers reduce costs, improve model performance, and create more efficient AI-powered apps.
OpenAI
OpenAI revealed several enhancements to its API offerings during a developer day event in San Francisco. With these upgrades, developers will be able to lower the cost of repetitive prompts, create new speech-based applications, further modify models, and improve the performance of smaller models.
During the event, OpenAI revealed four significant API updates: prompt caching, vision fine-tuning, model distillation, and the launch of a new API service called Realtime. For those who are unaware, software developers can include functionality from an external application into their creations using an API (application programming interface).
Also Read: OpenAI to Raise $6.5B at $150B Valuation, SoftBank Plans $500M Contribution
Model distillation is a novel technique that the business devised to improve the performance of smaller models, such as the GPT-4o mini, by fine-tuning them with the outputs of larger models. The business stated in a blog post that “developers had to manually orchestrate multiple operations across disconnected tools, from generating datasets to fine-tuning models and measuring performance improvements, until now, distillation has been a multi-step, error-prone process.”
OpenAI developed a model distillation suite within its API platform to streamline the procedure. With the use of sophisticated models like GPT-4o and o1-preview, developers can create their datasets by generating high-quality replies, refining a smaller model to follow those responses, and then creating and executing bespoke evaluations to gauge the model’s performance on particular tasks.
To assist developers in getting started with distillation, OpenAI claims to be giving away 2 million free training tokens per day on GPT-4o mini and 1 million free training tokens per day on GPT-4o until October 31. (Tokens are bits of information that artificial intelligence models analyze to comprehend requests.) A distilled model can be trained and operated at the same price as OpenAI’s regular fine-tuning fees.
Also Read: Sam Altman May Receive 7% Stake as OpenAI Explores Shift to For-Profit Public Benefit Corporation
Prompt caching is a new feature that allows developers to reuse frequently recurring prompts without having to pay full price each time. OpenAI has been laser-focused on bringing down the cost of its API services.
Long prefixes that specify how the model should behave when accomplishing a certain goal are frequently included in front of prompts in applications that use OpenAI’s models. Examples of such prefixes include telling the model to always format responses in bullet points or to react to all requests in a cheerful manner. Longer prefixes usually result in a better model and more consistent responses, but they also raise the price of each API call.
Long prefixes will now be automatically saved, or “cached,” by OpenAI for up to an hour. The API will automatically reduce the input cost by 50% if it finds a new prompt with the same prefix. The new functionality may result in significant cost savings for developers of AI systems with narrowly targeted use cases. In August, OpenAI competitor Anthropic added rapid caching to its line of models.
Also Read: OpenAI ChatGPT Voice Rolled Out Plus Users, Check How to use it on Mobile
Using images in addition to text, developers can now fine-tune GPT-4o. According to OpenAI, this will improve the model’s comprehension and recognition of images, opening up possibilities for “applications like enhanced visual search functionality, improved object detection for autonomous vehicles or smart cities, and more accurate medical image analysis.”
Developers can improve the model’s picture understanding performance by contributing a dataset of labeled photos to OpenAI’s platform. According to OpenAI, Coframe, a startup developing an AI-powered growth engineering assistant, has enhanced the assistant’s website code generation capabilities through the use of vision fine-tuning. GPT-4’s capacity to produce websites with a consistent visual style and proper layout was enhanced by 26% when hundreds of website images and the corresponding code were provided to it.
To get developers started, OpenAI will give out 1 million free training tokens every day during October. Starting in November, each million tokens will cost $25 for GPT-4o image fine-tuning.
Also Read: What is OpenAI Academy? Check Program Features for Developers and Organizations
As of last week, all ChatGPT users now have access to OpenAI’s human-sounding advanced voice mode. The business is now making its technology available to developers so they may create speech-to-speech applications.
Previously, a developer would have had to transcribe the audio, send the text to a language model like GPT-4 for processing, and then submit the result to a text-to-speech model to construct an AI-powered application that could speak to consumers. This method “often resulted in loss of emotion, emphasis, and accents, plus noticeable latency,” according to OpenAI.
The Realtime API is far quicker, less expensive, and more responsive since it processes audio instantly rather than requiring the user to connect several apps. Additionally, the API allows function calls, which enables applications powered by it to do tasks such as booking an appointment or ordering pizza. Eventually, realtime will be modified to include various types of multimodal experiences, including video.
The API will charge $5 for every million input tokens and $20 for every million output tokens to process text. When processing audio, the API will charge $100 per 1 million input tokens and $200 per 1 million output tokens. This corresponds to “roughly $0.06 per minute of audio input and $0.24 per minute of audio output,” according to OpenAI.
Also Read: Former Apple Designer Jony Ive Teams Up with OpenAI to Create Groundbreaking AI Mobile Hardware
This post was last modified on October 2, 2024 11:21 pm
Rish Gupta is an Indian entrepreneur who serves as the chief executive officer (CEO) of…
Are you looking to advance your engineering career in the field of robotics? Check out…
Artificial intelligence is a topic that has recently made internet users all over the world…
Boost your learning journey with the power of AI communities. The article below highlights the…
Demystify the world of Artificial Intelligence with our comprehensive AI Glossary and Terminologies Cheat Sheet.…
Scott Wu is the co-founder and Chief Executive Officer of Cognition Labs, an artificial intelligence…