
OpenAI Pushes AI Agent Capabilities With New Developer API 8
An anonymous reader quotes a report from Ars Technica: On Tuesday, OpenAI unveiled a new "Responses API" designed to help software developers create AI agents that can perform tasks independently using the company's AI models. The Responses API will eventually replace the current Assistants API, which OpenAI plans to retire in the first half of 2026. With the new offering, users can develop custom AI agents that scan company files with a file search utility that rapidly checks company databases (with OpenAI promising not to train its models on these files) and navigate websites -- similar to functions available through OpenAI's Operator agent, whose underlying Computer-Using Agent (CUA) model developers can also access to enable automation of tasks like data entry and other operations.
However, OpenAI acknowledges that its CUA model is not yet reliable for automating tasks on operating systems and can make unintended mistakes. The company describes the new API as an early iteration that it will continue to improve over time. Developers using the Responses API can access the same models that power ChatGPT Search: GPT-4o search and GPT-4o mini search. These models can browse the web to answer questions and cite sources in their responses. That's notable because OpenAI says the added web search ability dramatically improves the factual accuracy of its AI models. On OpenAI's SimpleQA benchmark, which aims to measure confabulation rate, GPT-4o search scored 90 percent, while GPT-4o mini search achieved 88 percent -- both substantially outperforming the larger GPT-4.5 model without search, which scored 63 percent.
Despite these improvements, the technology still has significant limitations. Aside from issues with CUA properly navigating websites, the improved search capability doesn't completely solve the problem of AI confabulations, with GPT-4o search still making factual mistakes 10 percent of the time. Alongside the Responses API, OpenAI released the open source Agents SDK, providing developers free tools to integrate models with internal systems, implement safeguards, and monitor agent activities. This toolkit follows OpenAI's earlier release of Swarm, a framework for orchestrating multiple agents.
However, OpenAI acknowledges that its CUA model is not yet reliable for automating tasks on operating systems and can make unintended mistakes. The company describes the new API as an early iteration that it will continue to improve over time. Developers using the Responses API can access the same models that power ChatGPT Search: GPT-4o search and GPT-4o mini search. These models can browse the web to answer questions and cite sources in their responses. That's notable because OpenAI says the added web search ability dramatically improves the factual accuracy of its AI models. On OpenAI's SimpleQA benchmark, which aims to measure confabulation rate, GPT-4o search scored 90 percent, while GPT-4o mini search achieved 88 percent -- both substantially outperforming the larger GPT-4.5 model without search, which scored 63 percent.
Despite these improvements, the technology still has significant limitations. Aside from issues with CUA properly navigating websites, the improved search capability doesn't completely solve the problem of AI confabulations, with GPT-4o search still making factual mistakes 10 percent of the time. Alongside the Responses API, OpenAI released the open source Agents SDK, providing developers free tools to integrate models with internal systems, implement safeguards, and monitor agent activities. This toolkit follows OpenAI's earlier release of Swarm, a framework for orchestrating multiple agents.
Thanks for the advertisement (Score:2, Insightful)
But from the work of the "AI agents" disguised as editors that we see here, it appears they aren't very useful.
I know... (Score:2)
I love rewriting my applications every 2-3 years because an API is no longer bussin and needs to have another buzzword iteration.
Re: (Score:2)
I love rewriting my applications every 2-3 years because an API is no longer bussin and needs to have another buzzword iteration.
OpenAI is currently engaged in "Move fast and fuck everything up" mode. Anybody expecting stability from them is currently barking up the wrong tree, in the wrong forest, after the wrong squirrel. They aren't even sure what they're doing yet. This is not business ready software, despite their fervent fever dreams to the contrary.
And the "permanent delivery scam" continues (Score:2)
This time they will get it wight, surely?
No. They will not. They will just get some more stupid money.
Re: (Score:2)
It will eventually change the world
There appears to be an epic battle going on, not between the US and China, but between the monopolists and open source
The monopolists want to own it, control it, and charge high prices
The open source community wants it available to all, either free or at a reasonable cost
If the monopolists, or even worse, governments, control it, the outcome will be worse than if it's available to all
Agentic AI is going to be really bad for infra (Score:3)
You know, everyone focuses on Terminator as the great example of how AI goes rogue in pop culture, but I'm thinking it's going to resemble Halo instead. Here's why.
It's only a matter of time before AI agents get added to replace sys admins. Then it's only a matter of time before some numbnut decides to roll a LLM-powered system to manage the agents.
At that point, you don't have Skynet from Terminator, you have Mendicant Bias from Halo. Then criminals, 4Chan and/or state actors hit the LLM-powered management system and convince it to command the AI agents to cause an absolute meltdown on the nodes they control. Lock out the company/government, fry data, run a supplied trojan to ship data, etc.