Take part in our daily and weekly newsletters to get the latest updates and exclusive content for reporting on industry -leading AI. Learn more
A new wave of AI-powered browser-use agents arises and promises to promise how companies interact with the web. These agents can autonomously navigate on websites, get information and even complete transactions – but early tests show considerable gaps between promises and performance.
While consumer examples offered by Openai’s new browser-us-agent operator, such as pizza or buying play tickets, have gripped headlines, the question is where the applications for developers and companies are. “What we do not know is what the killer app will be,” said Sam Wittenen, co-founder of Red Dragon, a company that develops AI agent applications. “I suspect it will be things that only take time on the Internet that they really don’t enjoy.” This includes things like the Internet and the search for the cheapest price for a product or booking the best hotel accommodation. It is used more likely in combination with others Tools like Deep ResearchWhere companies can then carry out even more demanding research Plus Execution of tasks related to the web.
Companies must carefully evaluate the fast -developing landscape, since established players and startups pursue different approaches to solve the autonomous browser challenge.
Key player in the landscape of the browser-use agent
The field is quickly overcrowded with large technology companies and innovative startups:
The operator and proxy are the most advanced to be consumer -friendly and ready outside the box. Many of the others seem to position themselves more for developers or company uses. For example, Browser useA Y-Combinator startup that users can use to adapt the models used with the agent. This gives you more control over how the agent works, including the use of a model from your local machine. But it is definitely more involved.
The others listed above offer a different degree of functionality and interaction with local machine resources. For the time being, I have not even decided to test the Ui stars from bytedance, as it was requested to lower access to the safety and data protection functions of my machine (if I test it, I will definitely use a secondary computer).
Tests shows the challenges in the argumentation
The easiest way to test is the Openai operator and the Proxy from Convergence. In our tests, the results have emphasized how important the functions of arguments are important as raw automation functions. The operator in particular was more incorrect.
For example, I asked the agents to find and summarize five most popular stories. It was an ambiguous task because Venturebeat has no “most popular” section per se. The operator fought with it. It initially fell into an infinite scroll loop and searched for “most popular” stories and required manual intervention. In another attempt, it found a three -year article entitled “Top five stories of the week. ““ In contrast, proxy showed better argument by identifying the five visible stories on the homepage as a practical proxy for popularity, and there were precise summaries.
The distinction became even clearer in real tasks. I asked the agents to book a reserve in a romantic restaurant for noon in Napa, California. The operator approached the task linear – first find a romantic restaurant and then check the availability at noon. If there were no tables available, it reached a dead end. Proxy showed more demanding argument by finding open -plated restaurants that were available both romantically and at the desired time. It even came back with a somewhat better rated restaurant.
Even seemingly simple tasks showed important differences. When looking for a “Yubikey 5c NFC Prize” at Amazon, the proxy quickly found the article easier than the operator.
Openai has not revealed much about technologies that used it to train the operating agent, except that it has trained its model for browser use tasks. However, the convergence has provided more details: his agent uses a generative tree search to use “web-world models that predict the status of the web according to a proposed measure. These are generated recursively to create a tree of possible future tree that is searched to select the next optimal action as they are classified by our value models. Our web-world models can also be used to train agents in hypothetical situations without generating a lot of expensive data. ” (More Here).
Benchmarks can be useless for the time being
These tools seem closely to match on paper. Convergence representative reaches 88% on the Webvoyager -Benchmarkin which webagent is rated on 643 real tasks on 15 popular websites such as Amazon and Booking.com. The Openai operator achieves 87% when using browser says it reaches 89% But it was only after the slight change in the webvoyager code base granted it “according to our requirements”.
However, these benchmark values should really be recorded with a grain of salt how they can be played. The actual test comes in practical use for cases in the real world. It is very early, the space changes so quickly and these products change almost every day. The results depend more on the specific jobs you want to do, and you may want to rely on the vibes you receive when using the different products.
Implications for companies
The effects on corporate automation are significant. As Wittee emphasized in our Video podcast conversation If we immerse ourselves deeply into the browser use in this browser use, many companies currently pay for virtual assistant-from real human to do basic web research and data acquisition tasks. These browsing means could change this equation drastically.
“If the AI does this,” notes Witte, “this will be some of the first low fruits of people who lose their work. It will appear in some such things.”
This could be inserted into the robot process automation -Trend (RPA), in which the use of browser is introduced as another tool for companies to automate other tasks. And as already mentioned, the more powerful uses are used when an agent combined browser is used with other tools, including things like how Deep researchWhere an LLM-controlled agent uses a search tool Plus The browser uses to do more demanding jobs.
Cost dynamics drive innovation
Another key factor that drives fast development is the availability of powerful open source argumentation models such as deepseek-r1. In this way, companies that build these browser-use agents can effectively competive with larger players by using these models instead of building their own.
The price pressure is already obvious. While Openai needs a monthly ChatgPTPro subscription of $ 200 for access to operators, convergence offers a limited free use (up to five purposes per day) and an unlimited plan of $ 20/month. This competitive dynamic should accelerate the introduction of companies, although there are still clear applications.
Security and integration challenges
Several hurdles remain before the widespread introduction of companies. Some websites are actively blocking automated browsers, while others require a captcha check. While Openaai and convergence have tools that can come to captchas, users can take on the task of filling them up – instead of doing them directly, since the entire point of captcha is to ensure that one person is at the other end. Tools such as the Ui stars from Bytedance require deep system access, which triggers security concerns for the provision of companies.
In addition, the approach to the website of the website varies. Openai has worked with certain partners such as Instacart, Priceline, Doordash and EtsyWhile others try to navigate on a website. This inconsistency could affect reliability for applications for companies. And of course when an agent hits a site that demands registration that slows down things – because the agents hand over the things to fill these details.
Look ahead
For companies that evaluate these tools, the focus should be on specific applications in which the autonomous web interaction can offer a clear value – be it in research, customer service or process automation. The technology leads quickly, but success will depend on the agreement of the specific business needs.
If this space develops, you expect you to see more functions and potentially specialized agents for certain industries or tasks. The race between established players and innovative startups should drive both technical progress and competitive prices and in 2025 to make a decisive year for the introduction of Enterprise browser-use agent.
You can find more information about these trends and tests in the Full video discussion between Sam Witteveen and me.
Source link