News

OpenAI’s New API Tools: Model Distillation, Prompt Caching, Realtime Speech Apps, and More

OpenAI's latest API enhancements, revealed during Developer Day, include new tools for speech applications, prompt caching, model distillation, and vision fine-tuning. These upgrades will help developers reduce costs, improve model performance, and create more efficient AI-powered apps.

OpenAI revealed several enhancements to its API offerings during a developer day event in San Francisco. With these upgrades, developers will be able to lower the cost of repetitive prompts, create new speech-based applications, further modify models, and improve the performance of smaller models.

During the event, OpenAI revealed four significant API updates: prompt caching, vision fine-tuning, model distillation, and the launch of a new API service called Realtime. For those who are unaware, software developers can include functionality from an external application into their creations using an API (application programming interface).

Also Read: OpenAI to Raise $6.5B at $150B Valuation, SoftBank Plans $500M Contribution

Model Distillation

Model distillation is a novel technique that the business devised to improve the performance of smaller models, such as the GPT-4o mini, by fine-tuning them with the outputs of larger models. The business stated in a blog post that “developers had to manually orchestrate multiple operations across disconnected tools, from generating datasets to fine-tuning models and measuring performance improvements, until now, distillation has been a multi-step, error-prone process.” 

OpenAI developed a model distillation suite within its API platform to streamline the procedure. With the use of sophisticated models like GPT-4o and o1-preview, developers can create their datasets by generating high-quality replies, refining a smaller model to follow those responses, and then creating and executing bespoke evaluations to gauge the model’s performance on particular tasks.

To assist developers in getting started with distillation, OpenAI claims to be giving away 2 million free training tokens per day on GPT-4o mini and 1 million free training tokens per day on GPT-4o until October 31. (Tokens are bits of information that artificial intelligence models analyze to comprehend requests.) A distilled model can be trained and operated at the same price as OpenAI’s regular fine-tuning fees.

Also Read: Sam Altman May Receive 7% Stake as OpenAI Explores Shift to For-Profit Public Benefit Corporation

Prompt Caching

Prompt caching is a new feature that allows developers to reuse frequently recurring prompts without having to pay full price each time. OpenAI has been laser-focused on bringing down the cost of its API services. 

Long prefixes that specify how the model should behave when accomplishing a certain goal are frequently included in front of prompts in applications that use OpenAI’s models. Examples of such prefixes include telling the model to always format responses in bullet points or to react to all requests in a cheerful manner. Longer prefixes usually result in a better model and more consistent responses, but they also raise the price of each API call.

Long prefixes will now be automatically saved, or “cached,” by OpenAI for up to an hour. The API will automatically reduce the input cost by 50% if it finds a new prompt with the same prefix. The new functionality may result in significant cost savings for developers of AI systems with narrowly targeted use cases. In August, OpenAI competitor Anthropic added rapid caching to its line of models.

Also Read: OpenAI ChatGPT Voice Rolled Out Plus Users, Check How to use it on Mobile

Vision Fine-Tunning

Using images in addition to text, developers can now fine-tune GPT-4o. According to OpenAI, this will improve the model’s comprehension and recognition of images, opening up possibilities for “applications like enhanced visual search functionality, improved object detection for autonomous vehicles or smart cities, and more accurate medical image analysis.” 

Developers can improve the model’s picture understanding performance by contributing a dataset of labeled photos to OpenAI’s platform. According to OpenAI, Coframe, a startup developing an AI-powered growth engineering assistant, has enhanced the assistant’s website code generation capabilities through the use of vision fine-tuning. GPT-4’s capacity to produce websites with a consistent visual style and proper layout was enhanced by 26% when hundreds of website images and the corresponding code were provided to it.

To get developers started, OpenAI will give out 1 million free training tokens every day during October. Starting in November, each million tokens will cost $25 for GPT-4o image fine-tuning. 

Also Read: What is OpenAI Academy? Check Program Features for Developers and Organizations

Realtime

As of last week, all ChatGPT users now have access to OpenAI’s human-sounding advanced voice mode. The business is now making its technology available to developers so they may create speech-to-speech applications.

Previously, a developer would have had to transcribe the audio, send the text to a language model like GPT-4 for processing, and then submit the result to a text-to-speech model to construct an AI-powered application that could speak to consumers. This method “often resulted in loss of emotion, emphasis, and accents, plus noticeable latency,” according to OpenAI. 

The Realtime API is far quicker, less expensive, and more responsive since it processes audio instantly rather than requiring the user to connect several apps. Additionally, the API allows function calls, which enables applications powered by it to do tasks such as booking an appointment or ordering pizza. Eventually, realtime will be modified to include various types of multimodal experiences, including video.

The API will charge $5 for every million input tokens and $20 for every million output tokens to process text. When processing audio, the API will charge $100 per 1 million input tokens and $200 per 1 million output tokens. This corresponds to “roughly $0.06 per minute of audio input and $0.24 per minute of audio output,” according to OpenAI.

Also Read: Former Apple Designer Jony Ive Teams Up with OpenAI to Create Groundbreaking AI Mobile Hardware

This post was last modified on October 2, 2024 11:21 pm

Kumud Sahni Pruthi

A postgraduate in Science with an inclination towards education and technology. She always looks for ways to help people improve their lives by putting complex things into simple words through her writing.

Recent Posts

Rish Gupta Net Worth: CEO & Co-Founder of Spot AI

Rish Gupta is an Indian entrepreneur who serves as the chief executive officer (CEO) of…

April 19, 2025

Top 10 Robotics Skills Required for Engineering Career Growth

Are you looking to advance your engineering career in the field of robotics? Check out…

April 18, 2025

Top 20 Books on AI in 2025: The Ultimate Reading List on Artificial Intelligence

Artificial intelligence is a topic that has recently made internet users all over the world…

April 18, 2025

Top 10 Best AI Communities in 2025

Boost your learning journey with the power of AI communities. The article below highlights the…

April 18, 2025

Artificial Intelligence (AI) Glossary and Terminologies – Complete Cheat Sheet List

Demystify the world of Artificial Intelligence with our comprehensive AI Glossary and Terminologies Cheat Sheet.…

April 18, 2025

Scott Wu Net Worth: Devin AI Software Engineer, CEO of Cognition Labs

Scott Wu is the co-founder and Chief Executive Officer of Cognition Labs, an artificial intelligence…

April 17, 2025