ArXiv

MDrive: Benchmarking Closed-Loop Cooperative Driving for End-to-End Multi-agent Systems

2026-05-11 · 17:44 UTC ·Marco Coscoy, Zewei Zhou, Seth Z. Zhao... ·1 min read

Authors: Marco Coscoy, Zewei Zhou, Seth Z. Zhao...
Categories: cs.RO
arXiv: https://arxiv.org/abs/2605.10904v1
PDF: https://arxiv.org/pdf/2605.10904v1

Brief

MDrive introduces a reproducible closed-loop benchmark of 225 cooperative driving scenarios to address gaps where open-loop evaluations and prior closed-loop tests lack interaction and behavioral diversity. Using Real2Sim and human-in-the-loop tools, the authors find multi-agent methods typically beat single-agent baselines, yet perception-sharing and negotiation techniques have mixed effects on downstream planning; full paper not reviewed (abstract-only here).

Why it matters

MDrive is a closed-loop cooperative driving benchmark released 2026-05-11 that contains 225 scenarios grounded in NHTSA pre-crash typologies and real-world V2X datasets and offers an open-source toolbox (scenario generation, Real2Sim conversion, human-in-the-loop simulation) at https://mdrive-challenge.github.io/.

Key details

Empirical results on MDrive show multi-agent systems generally outperform single-agent counterparts, but reveal two key failure modes: (a) perception sharing consistently improves perception but does not reliably translate into better planning, and (b) negotiation/coordination improves planning in some settings but degrades performance in complex, dense traffic scenarios.

Source evidence

Abstract

Vehicle-to-Everything (V2X) communication has emerged as a promising paradigm for autonomous driving, enabling connected agents to share complementary perception information and negotiate with each other to benefit the final planning. Existing V2X benchmarks, however, fall short in two ways: (i) open-loop evaluations fail to capture the inherently closed-loop nature of driving, leading to evaluation gaps, and (ii) current closed-loop evaluations lack behavioral and interactive diversity to reflect real-world driving. Thus, it is still unclear the extent of benefits of multi-agent systems for closed-loop driving. In this paper, we introduce MDrive, a closed-loop cooperative driving benchmark comprising 225 scenarios grounded in both NHTSA pre-crash typologies and real-world V2X datasets. Our benchmark results demonstrate that multi-agent systems are generally better than single-agent counterparts. However, current multi-agent systems still face two important challenges: (i) perception sharing enhances perceptions, but doesn't always translate to better planning; (ii) negotiation improves planning performance but harms it in complex and dense traffic scenarios. MDrive further provides an open-source toolbox for scenario generation, Real2Sim conversion, and human-in-the-loop simulation. Together, MDrive establishes a reproducible foundation for evaluating and improving the generalization and robustness of cooperative driving systems.

Comment: website:https://mdrive-challenge.github.io/