ArXiv

SI-Diff: A Framework for Learning Search and High-Precision Insertion with a Force-Domain Diffusion Policy

2026-05-12 · 15:14 UTC ·Yibo Liu, Stanko Oparnica, Simon Shewchun-Jakaitis... ·1 min read

Authors: Yibo Liu, Stanko Oparnica, Simon Shewchun-Jakaitis...
Categories: cs.RO
arXiv: https://arxiv.org/abs/2605.12247v1
PDF: https://arxiv.org/pdf/2605.12247v1

Brief

SI-Diff addresses contact‑rich assembly (peg‑in‑hole) by learning a single force‑domain diffusion policy with a mode‑conditioning mechanism and a search teacher policy that generates diverse trajectories; it maps tactile and end‑effector velocity observations to actions. Experiments report extending x–y misalignment tolerance from 2 mm to 5 mm versus TacDiffusion and strong zero‑shot transfer to unseen shapes. Abstract only.

Why it matters

SI-Diff introduces a single force‑domain diffusion policy with a novel mode‑conditioning mechanism and a search teacher policy; it trains on tactile and end‑effector velocity observations to learn both search and high‑precision insertion behaviors without switching models.

Key details

On peg‑in‑hole experiments the model extends x–y misalignment tolerance from 2 mm to 5 mm compared to TacDiffusion and demonstrates strong zero‑shot transferability to unseen shapes (paper posted 2026-05-12).

Source evidence

Abstract

Contact-rich assembly is fundamental in robotics but poses significant challenges due to uncertainties in relative poses, such as misalignments and small clearances in peg-in-hole tasks. Existing approaches typically address search and high-precision insertion separately, because these tasks involve distinct action patterns. However, supporting both tasks within a single model, without switching models or weights, is desirable for intelligent assembly systems. In this work, we propose SI-Diff, a framework that learns both search and high-precision insertion through a force-domain diffusion policy. To this end, we introduce a new mode-conditioning mechanism that enables the policy to capture distinct action behaviors under a single framework. Moreover, we develop a new search teacher policy that can generate diverse trajectories. By training on successful and efficient demonstrations provided by the teacher policy, the model learns the mapping from tactile and end-effector velocity observations to effective action behaviors. We conduct thorough experiments to show that SI-Diff extends the tolerance to x-y misalignments from 2 mm to 5 mm compared to the state-of-the-art baseline, TacDiffusion, while also demonstrating strong zero-shot transferability to unseen shapes.

Comment: 9 pages, 8 figures