OpenAI launched a voice API and image fine-tuning at its annual DevDay. But it failed to release two highly-anticipated models.
OpenAI launched several tools at its annual DevDay on Tuesday, including a voice chat API and image fine-tuning.
But the organization failed to release a full version of the highly anticipated o1 model or the video-generation model Sora. Nor did it offer any updates on the GPT Store announced last year.
The main DevDay announcements include:
"Realtime API" capable of low-latency AI-generated voice response
Vision fine-tuning
Prompt caching (with discounts)
Model distillation
Here's a breakdown of all of them! Plus a look at the drama in OpenAI's C-suite:
The low-latency speech-to-speech Realtime API was the biggest launch of the day. It gives developers the opportunity to make voice chat apps using six preset chat voices.
The API can’t make its own phone calls, but it does work with calling APIs like Twilio, as demonstrated by Romain Huet, who used it to order 400 chocolate-covered strawberries from a fictional store.
Interestingly, AI disclosures don’t come as standard for the new API. For now at least, the onus is on developers to let users know they’re speaking with an AI voice.
Other features include the ability to place pins on a map during chats, which can help users looking for place-based recommendations.
The Chat Completions API will also be upgraded to enable audio input and output without Realtime’s low-latency benefits.
Plenty of indie hackers are excited about the tech. Here's @SullyOmarr on X:
Other X users pointed out flaws.
Although VC Deedy Das called the strawberry demo “awesome,” he took issue with its speed:
“The response latency is ~2s (cutting-edge is <400ms) and the voice doesn't feel as good as "advanced voice mode", it's still devoid of emotions.”
Price was another concern. X user @Simoarcher pointed out that the API is expensive compared to voice AI options that work by combining older models.
Developers will be able to make up their own minds in the coming days as Realtime API is rolled out via the OpenAI playground.
Developers will be able to fine-tune GPT-4o models using pictures, which should make them better at interpreting images and recognizing objects.
But some images will be off limits: copyrighted images and those that don’t meet OpenAI’s safety rules.
OpenAI gave rideshare and food delivery app Grab the chance to test the feature out with its mapping service GrabMaps. Which worked (obviously): the app's route-mapping saw big improvements.
And this means that some Southeast Asia-based indie hackers may already have benefitted from the new tech.
Model distillation means that devs will be able to use bigger and more expensive models to train smaller ones. Think using GPT-4o to fine-tune GPT-40-mini.
This should improve the quality of a smaller model at a fraction of the cost of training a larger one from scratch. A new evaluation function will allow coders to measure how well a fine-tuned model performs.
The entire process will be managed through an integrated workflow in the OpenAI platform.
From now on, developers will pay less for the prompts they use frequently. According to OpenAI's docs:
“By reusing recently seen input tokens, developers can get a 50% discount and faster prompt processing times.”
Developers using the latest versions of GPT-4o, GPT-4o mini, o1-preview, and o1-mini (or fine-tuned versions of these models) don’t need to do anything to get the discount, which OpenAI will apply automatically to input tokens it’s seen recently.
This is good news for developers using these models, but it may not be enough to win over those who aren’t. As TechCrunch notes, Anthropic already offers a better deal:
“OpenAI says developers can save 50% using this feature, whereas Anthropic promises a 90% discount for it.”
DevDay comes hot on the heels of a major reshuffle at OpenAI, following the departure of three senior executives last week.
Chief Technology Officer Mira Murati, Chief Research Officer Bob McGrew and a vice president of research Barret Zoph had all left OpenAI.
Altman announced vice president of research Mark Chen will become senior VP of research, lead OpenAI’s research organization with Jakub Pachocki. Pachokcki has been made chief scientist.
Altman said his own focus will shift from the non-technical aspects of the organization to the product and technical side.
He seemed to reference the corporate turmoil that’s plagued the organization over the last year on X ahead of DevDay:
Alongside pricing changes and technological progress, OpenAI had shipped “a little bit of drama” since the last devday.
The path to artificial general intelligence, he claimed, “has never felt more clear.”
If a recent Wall Street Journal report is anything to go by, its very likely the organization — which is still losing billions of dollars —will convert from a nonprofit to a fully-blown for-profit company within the next two years.
If it doesn’t, it risks having to pay back investors in a multi-billion-dollar funding round expected to close this week, per the Journal.
I’m excited about the possibilities of voice applications. I want an AI to call utility/internet/banks/airlines for me
I've been obsessed with text-to-speech since way before AI was good (see this 2020 tweet for proof!). So I'm of course super thrilled about the Realtime speech-to-speech API.
In general I don't think people appreciate the value of tiny incremental improvements in this space. like this Deedy Das VC guy that whined about the response latency:
OpenAI’s Realtime API looks great for voice interactions, but I’ve been using Kodexia, a conversational AI platform that not only provides real-time responses but also adapts to customer interactions seamlessly. It's been a huge boost for us, especially in delivering more human-like and emotionally responsive conversations. Looking forward to comparing it with the new API!
That's so amazing. AI will change the world in coming years.
Anyone have it on ChatGPT already? Missing on Playground as well.
Just got it on Playground! What should I ask 😂?
ok, i just tried asking it for restaurant recommendations in Pererenan, Bali ... I got BBQ spots in Paraná and met the rate limit 1 minute in 🙃