ArXiv

123D: Unifying Multi-Modal Autonomous Driving Data at Scale

Authors
Daniel Dauner, Valentin Charraut, Bastian Berle...
Categories
cs.RO, cs.CV
arXiv
https://arxiv.org/abs/2605.08084v1
PDF
https://arxiv.org/pdf/2605.08084v1

Brief

123D unifies multi-modal driving datasets by representing each sensor or annotation as a timestamped event stream, allowing flexible synchronization across heterogeneous formats. The authors merge eight real-world datasets (3,300 hours, 90,000 km) and a synthetic generator, perform systematic analyses of annotations and pose/calibration, and demonstrate cross-dataset 3D detection transfer and RL planning; the framework and tools are released open-source.

Why it matters

123D is an open-source framework that unifies multi-modal autonomous-driving data under a single API by storing each sensor/modal modality as an independent timestamped event stream (no prescribed rate), enabling synchronous or asynchronous access across heterogeneous datasets; it consolidates eight real-world datasets totaling 3,300 hours and 90,000 kilometers plus a configurable synthetic dataset.

Key details

  • The authors use 123D to systematically compare annotation statistics and evaluate pose/calibration accuracy across datasets, and showcase two applications enabled by the framework: cross-dataset 3D object-detection transfer and reinforcement-learning for planning; code and docs are at https://github.com/kesai-labs/py123d.
Source evidence

Abstract

The pursuit of autonomous driving has produced one of the richest sensor data collections in all of robotics. However, its scale and diversity remain largely untapped. Each dataset adopts different 2D and 3D modalities, such as cameras, lidar, ego states, annotations, traffic lights, and HD maps, with different rates and synchronization schemes. They come in fragmented formats requiring complex dependencies that cannot natively coexist in the same development environment. Further, major inconsistencies in annotation conventions prevent training or measuring generalization across multiple datasets. We present 123D, an open-source framework that unifies such multi-modal driving data through a single API. To handle synchronization, we store each modality as an independent timestamped event stream with no prescribed rate, enabling synchronous or asynchronous access across arbitrary datasets. Using 123D, we consolidate eight real-world driving datasets spanning 3,300 hours and 90,000 kilometers, together with a synthetic dataset with configurable collection scripts, and provide tools for data analysis and visualization. We conduct a systematic study comparing annotation statistics and assessing each dataset's pose and calibration accuracy. Further, we showcase two applications 123D enables: cross-dataset 3D object detection transfer and reinforcement learning for planning, and offer recommendations for future directions. Code and documentation are available at https://github.com/kesai-labs/py123d.