The browser-based assistant can perform everyday tasks like booking restaurants automatically.
"Operator" can interface with websites to perform simple tasks
OpenAI will release the model behind the agent via API soon
The company plans to release more agents in the coming weeks
OpenAI has launched its first AI agent, "Operator": an assistant that can automatically perform everyday tasks like booking restaurants and filling out forms.
Operator "sees" and interacts with websites via a new "state-of-the-art" computer use agent model that runs underneath a dedicated cloud-based browser.
At the moment, it still makes "embarrassing" mistakes as it grapples with typing and clicking. But OpenAI expects the agent to improve fast.
CEO Sam Altman, who announced the release in a livestream Thursday morning, said his company had a "long and great history" of developing "early research previews" into "products people really love."
"This is really the beginning of this product," he said.
For now, Operator is only available as to Pro users in the U.S., but OpenAI will roll out out to Plus accounts in the coming months. Users based in the European Union may have to wait longer.
Operator combines a new "state-of-the-art" Computer Using Agent model with GPT-4o's vision capabilities. This means it can interpret graphical user interfaces like booking forms, and use "advanced reasoning" skills to fill them out.
It's not yet accessible via the OpenAI API, but the firm plans to release it soon so developers can use it in their own agents.
At the moment, users can ask the agent to perform tasks by inputting prompts into a browser interface, much like ChatGPT. The agent can then search the web and interact with the sites it finds to complete the job.
Although it's prone to errors, OpenAI says the CUA model is best-in-class. It gets web-based tasks right most of the time, according to the WebArena and WebVoyager benchmarks, on which it scores 58.1% and 87%, respectively.
CUA isn't quite so good at full computer tasks, which it performs successfully 38.1% of the time, according to the OSWorld test. But that's better than it's closest rival — Anthopic's Claude — which scored up to 22% on its launch in October. Claude does have an edge for developers, however, as it's available via an API.
Both Claude and Operator have been tested by a wide range of high profile companies to improve their real-world nouse. Anthropic has been used by Asana, Canva, Cognition, DoorDash, Replit, and The Browser Company before launch, while OpenAI is working with OpenTable, Etsy, Instacart, StubHub, Uber, Thumbtack, DoorDash, eBay, Target and others.
OpenAI says it's rolling Operator out slowly because of how high the safety stakes are.
It's designed to always "ensure the person is in control" by asking questions and requiring confirmation before it performs tasks. The more sensitive the task, the more confirmation it will require. A prompt injection feature adds an extra layer of oversight designed to spot and stop suspicious activity.
But that doesn't mean users will have to wait for more exciting developments. Although it's been slower than other AI companies to release agents, the company says more are on the way in the coming weeks.
Wow... this is so interesting. could go a long way.
This OpenAI's Operator sounds really exciting!
Thanks,Katie ,and Keep up the amazing work with EasyFOI ,I’m definitely going to use it to gather some data for my own projects. Keep going, girl!"
nice
This is very insightful.
Looks scary but either ways its gonna need a lot of time for people to bring it to use because after all we need assurance
Thank you, Katie! I’ve noticed AI agents are popping up everywhere now. I even heard about this tool called Workbeaver which is pretty interesting because it learns your workflow via screen sharing and runs locally on your PC, not in a virtual environment like most AI agents. Definitely curious to see how these tools evolve!
It's great!
Wow!
Nice
so far so we know, ChatGPT is the key AI which other websites are integrating API of openAI. Also, algorithm of openai is so strong which can't be compared to other websites.
The question is - is anyone gonna use this for something that's important to them? For ex. would you trust this agent to find the cheapest flight for you?
"It's a valid concern! While it's important to verify the reliability of the tool, if it can demonstrate accurate results and efficiency, it might be worth considering. For finding the cheapest flight, it\u2019s always a good idea to cross-check with trusted resources and platforms as well."
I think people would use it for trivial task first, but in fact, there's already some users sharing their flight being booked by Operator.
“Safety first”, well…, shouldn’t it be applied everywhere as how everything is built in our society? I can’t imagine a company pitching a product and just oppositely saying, “Danger first”, right?
I was just thinking today about how I wanted to just be able to tell my computer to book a time slot at the gym for me, good timing