UMI-Bench 1.0

UMI-Bench 1.0: An Open and Reproducible Real-World Benchmark for Tabletop Robotic Manipulation with UMI Data

Shi Jin^1,2†, Yuntian Wang^1,2†, Yuhui Duan^1,2†, Di Wu^1,2†, Gaoqi Dong^2†, Xiaohang Liu^2§, Xiaotong Li^2§, Hongfei Jia^2§, Zehao Zhang^2§, Tianyu Wang³, Zhongjie Jia⁴, Yuanqi Yao⁷, Chenjia Bai⁵, Zhaxizhuoma^4,6,2‡, Siao Liu^1‡, Nieqing Cao⁸, Jin Wang^1‡, Chao Yu², Yan Ding^2,3,8

¹Soochow University · ²Lumos Robotics · ³Fudan University · ⁴Shanghai Jiao Tong University
⁵Shanghai TeleAI · ⁶Shanghai AI Laboratory · ⁷INSAIT · ⁸Xi'an Jiaotong-Liverpool University

^† Equal contribution · ^§ Equal contribution · ^‡ Corresponding authors

Abstract

Real-robot evaluation is essential for understanding whether learned manipulation policies can operate reliably outside curated demonstrations. This need is particularly pressing for Universal Manipulation Interface (UMI)-style policies, whose performance depends on the coupling between wrist-view observations, action representation, data collection, and physical deployment.

Existing real-world benchmarks have made important progress, but they are not designed around this UMI data-to-deployment setting. We present UMI-Bench 1.0, a local-first real-robot benchmark for standardized evaluation of UMI-style manipulation policies. To the best of our knowledge, this is the first benchmark dedicated to real-world evaluation of UMI-based manipulation models.

UMI-Bench aligns data collection, scene reset, policy execution, result logging, and task-factor analysis within a unified protocol. By making the full evaluation process reproducible and auditable, UMI-Bench provides a practical testbed for measuring how UMI-trained policies generalize to real physical manipulation.

UMI-Bench overview showing data collection, data samples, and real-world evaluation workstations

Data-to-evaluation Pipeline

UMI-Bench connects demonstration capture, UMI data samples, task setup, real-world evaluation workstations, and rollout logging into one reproducible benchmark workflow.

UMI-Bench main figure summarizing standardized data collection, task suite, training repository, baselines, and grid-based real-world evaluation

Unseen condition, 2x playback

BibTeX

Citation

@misc{jin2026umibench,
  title         = {UMI-Bench 1.0: An Open and Reproducible Real-World Benchmark for Tabletop Robotic Manipulation with UMI Data},
  author        = {Shi Jin and Yuntian Wang and Yuhui Duan and Di Wu and Gaoqi Dong and Xiaohang Liu and Xiaotong Li and Hongfei Jia and Zehao Zhang and Tianyu Wang and Zhongjie Jia and Yuanqi Yao and Chenjia Bai and Zhaxizhuoma and Siao Liu and Nieqing Cao and Jin Wang and Chao Yu and Yan Ding},
  year          = {2026},
  eprint        = {2606.10382},
  archivePrefix = {arXiv},
  primaryClass  = {cs.RO},
  url           = {https://arxiv.org/abs/2606.10382}
}