Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence
Abstract
Agent-World introduces a self-evolving training framework that advances general agent intelligence through autonomous environment discovery and continuous learning across diverse real-world scenarios.
Large language models are increasingly expected to serve as general-purpose agents that interact with external, stateful tool environments. The Model Context Protocol (MCP) and broader agent skills offer a unified interface for connecting agents with scalable real-world services, but training robust agents remains limited by the lack of realistic environments and principled mechanisms for life-long learning. In this paper, we present Agent-World, a self-evolving training arena for advancing general agent intelligence through scalable environments. Agent-World has two main components: (1) Agentic Environment-Task Discovery, which autonomously explores topic-aligned databases and executable tool ecosystems from thousands of real-world environment themes and synthesizes verifiable tasks with controllable difficulty; and (2) Continuous Self-Evolving Agent Training, which combines multi-environment reinforcement learning with a self-evolving agent arena that automatically identifies capability gaps through dynamic task synthesis and drives targeted learning, enabling the co-evolution of agent policies and environments. Across 23 challenging agent benchmarks, Agent-World-8B and 14B consistently outperforms strong proprietary models and environment scaling baselines. Further analyses reveal scaling trends in relation to environment diversity and self-evolution rounds, offering insights for building general agent intelligence.
Community
We introduce Agent-World , a general-purpose agent training arena that couples real-world environment synthesis with continuous self-evolving training, forming a closed loop in which agents and environments co-evolve.
It consists of two parts:
(1) Agentic environment–task discovery . A deep-search agent, anchored on real-world environment themes, autonomously mines environment databases from the web, generates executable tools, and synthesizes verifiable tasks.
(2) Continuous self-evolving training . Agents are trained with multi-environment reinforcement learning, while the synthesized environments serve as a training arena that automatically diagnoses capability gaps and targets environment/task expansion, enabling sustained self-evolution.
In total, Agent-World builds 1,978 environments and 19,822 tools, with synthesized tasks averaging more than 15 interaction turns .
Across 23 challenging benchmarks (including $\tau^2$-Bench, BFCL V4, MCP-Mark, ClawEval, SkillsBench, etc.), Agent-World-8B/14B consistently outperforms existing environment-scaling methods and strong open-source foundation models. Further analyses reveal a clear scaling relationship among environment diversity, self-evolution rounds, and agent performance.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- MagicAgent: Towards Generalized Agent Planning (2026)
- Agentic Tool Use in Large Language Models (2026)
- Benchmark Test-Time Scaling of General LLM Agents (2026)
- EigenData: A Self-Evolving Multi-Agent Platform for Function-Calling Data Synthesis, Auditing, and Repair (2026)
- The World Won't Stay Still: Programmable Evolution for Agent Benchmarks (2026)
- SEARL: Joint Optimization of Policy and Tool Graph Memory for Self-Evolving Agents (2026)
- Safe and Scalable Web Agent Learning via Recreated Websites (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2604.18292 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper






