Robust Tool Use via Fission-GRPO: Learning to Recover from Execution Errors
Abstract
A framework called Fission-GRPO is introduced to improve multi-turn tool execution in large language models by converting execution errors into corrective supervision during reinforcement learning training.
Large language models (LLMs) can call tools effectively, yet they remain brittle in multi-turn execution: following a tool call error, smaller models often degenerate into repetitive invalid re-invocations, failing to interpret error feedback and self-correct. This brittleness hinders reliable real-world deployment, where the execution errors are inherently inevitable during tool interaction procedures. We identify a key limitation of current approaches: standard reinforcement learning (RL) treats errors as sparse negative rewards, providing no guidance on how to recover, while pre-collected synthetic error-correction datasets suffer from distribution mismatch with the model's on-policy error modes. To bridge this gap, we propose Fission-GRPO, a framework that converts execution errors into corrective supervision within the RL training loop. Our core mechanism fissions each failed trajectory into a new training instance by augmenting it with diagnostic feedback from a finetuned Error Simulator, then resampling recovery rollouts on-policy. This enables the model to learn from the precise errors it makes during exploration, rather than from static, pre-collected error cases. On the BFCL v4 Multi-Turn, Fission-GRPO improves the error recovery rate of Qwen3-8B by 5.7% absolute, crucially, yielding a 4% overall accuracy gain (42.75% to 46.75%) over GRPO and outperforming specialized tool-use agents.
Community
Robust Tool Use via FISSION-GRPO: Learning to Recover from Execution Errors
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- CLEANER: Self-Purified Trajectories Boost Agentic Reinforcement Learning (2026)
- From Failure to Mastery: Generating Hard Samples for Tool-use Agents (2026)
- Teaching LLMs to Learn Tool Trialing and Execution through Environment Interaction (2026)
- Close the Loop: Synthesizing Infinite Tool-Use Data via Multi-Agent Role-Playing (2025)
- AgentMath: Empowering Mathematical Reasoning for Large Language Models via Tool-Augmented Agent (2025)
- AWPO: Enhancing Tool-Use of Large Language Models through Explicit Integration of Reasoning Rewards (2025)
- MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper