Researchers improved the performance of AI agents on unfamiliar tasks using Dungeons and Dragons.

robot-dnd.jpg

Subscribe to our daily and weekly newsletters to receive the latest updates and exclusive content on industry-leading AI reporting. Learn more


Organizations interested in deploying AI agents must first fine-tune them, especially for workflows that often feel routine. While some organizations want agents to perform only one type of task in a workflow, sometimes agents need to be introduced to new environments in the hope that they will adapt.

Researcher of the Beijing University of Posts and Telecommunications have introduced a new method, AgentRefine. It teaches agents to self-correct, resulting in more general and adaptable AI agents.

The researchers said current optimization methods limit agents to the same tasks as their training data set, or “paused” tasks, and don’t work as well in “paused” or new environments. If agents trained using these frameworks only followed the rules established in the training data, they would have difficulty “learning” from their mistakes and could not be made general agents and included in new workflows.

To address this limitation, AgentRefine aims to create more general agent training datasets that allow the model to learn from mistakes and fit into new workflows. In a new paperAccording to the researchers, the goal of AgentRefine is to “develop generalized agent optimization data and establish the correlation between agent generalization and self-refinement.” When agents self-correct, they will not retain the mistakes they have learned and will not transfer those same mistakes to other environments in which they are deployed.

“We find that optimizing agents using self-refinement data enables the agent to explore more feasible actions while navigating bad situations, leading to better generalization to new agent environments,” the researchers write.

D&D inspired AI agent training

Based on the tabletop role-playing game Dungeons & Dragons, The researchers created personas, scripts for the agent to follow, and challenges. And yes, there is a Dungeon Master (DM).

They divided data construction for AgentRefine into three areas: script generation, trajectory generation, and verification.

In script generation, the model creates a script or guide with information about the environment, tasks, and actions that personas can perform. (Researchers tested AgentRefine with Llama-3-8B-Instruct, Llama-3-70B-Instruct, Mistral-7B-Instruct-v0.3, GPT-4o-mini and GPT-4o)

The model then generates erroneous agent data, acting as both DM and player during the trajectory phase. It evaluates the actions it can take and then checks whether there are errors in them. The final phase, verification, checks the script and trajectory, taking into account the potential of agents being trained to make self-corrections.

Better and more diverse task skills

The researchers found that agents trained using the AgentRefine method and dataset performed better on various tasks and adapted to new scenarios. These agents become more self-correcting to realign their actions and decisions to avoid errors, thereby becoming more robust.

In particular, AgentRefine improved the performance of all models to work on outstanding tasks.

Companies need to be able to better adapt their agents to tasks so that they can not only repeat what they have learned, but become better decision makers. Agent orchestration not only “routes” traffic for multiple agents, but also determines whether agents have completed tasks based on user requests.

OpenAIis o3 offers “program synthesis”. which could improve adaptability to tasks. Other orchestration and training frameworks, like Magentic One out of Microsoftsets actions for supervisor agents to learn when to move tasks to other agents.



Source link
Spread the love

Leave a Reply

Your email address will not be published. Required fields are marked *