General Task Planning for Multi-Robot Teams in Open-World Environments

System Integration

System integration overview
Abstract

Task planning is a key enabler for general multi-robot systems. However, existing approaches are often constrained to specific tasks and exhibit limited reliability, making them inadequate for open-world scenarios involving diverse tasks, contingencies, and human-robot interaction. In this work, we propose a general multi-robot task planning framework with three layers: a task planning layer, a world model layer, and an execution platform layer. At the task planning layer, we introduce GT-Planner, a general planner that integrates the generalization capabilities of large language models (LLMs) with the reliability and optimality of optimization-based methods. We design a generic prompting scheme and fine-tune a small-parameter LLM to balance LLM generality with improved accuracy. We further propose the TANGO algorithm for task allocation, which supports a broader class of constraints, including pre-specified robot execution. At the world model layer, we maintain a dynamic world model that assesses task progress and translates upper-layer plans for the lower layer. At the execution platform layer, we present the CyberCity simulation platform: CyberCity-Unreal enables high-fidelity task simulation, while CyberCity-Semantic supports large-scale evaluation and LLM fine-tuning. We also develop a human-robot interaction interface that accommodates multiple interaction modalities. Finally, we release CyberBench, the first open-world multi-robot task-planning benchmark, featuring five heterogeneous robot types, 29+ environmental element types, 10 core goal tasks, 19 contingency templates, and 105 tasks.

GT-Planner, TANGO, and the three-layer architecture

Method overview

As presented in the paper, GT-Planner interprets natural-language instructions and constructs a task dependency graph, TANGO performs task allocation under richer constraints, the world model tracks task progress and feedback, and CyberCity executes the resulting atomic skills in simulation.

Benchmark Setting

CyberBench figure

CyberBench covers four heterogeneous robot types, more than 50 environment element types, 10 core goal tasks, 19 contingency-event templates, and on the order of 105 evaluation tasks.

Quantitative Comparison

General Planning

General task planning statistics

These results compare overall success rate, efficiency, planning quality, and energy consumption across methods. The paper reports GT-Planner as the strongest overall method for general task planning.

Dynamic Replanning

Dynamic replanning statistics

These results evaluate replanning across difficulty levels and show that GT-Planner maintains stronger robustness and lower performance degradation as contingencies increase.

Fine-Tuning Placeholder

Fine-tuning placeholder figure

The current paper text describes a two-stage fine-tuning strategy: supervised fine-tuning for initialization, followed by RL-based fine-tuning for generalization. This slot currently reuses the framework figure and can be replaced with a dedicated fine-tuning result later.

Citation
@misc{general_task_planning_multi_robot_2026,
  title={General Task Planning for Multi-Robot Teams in Open-World Environments},
  author={Anonymous submission},
  year={2026},
  note={Project page and code release aligned with the current anonymous draft},
  url={https://miangchen.github.io/MultiAgent-Unreal/}
}