Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

Source

Raw Markdown: paper_diffusion-policy-2023.md
PDF: paper_diffusion-policy-2023.pdf
Preprint: arXiv 2303.04137

Core Claim

Diffusion Policy models a distribution over future action trajectories by denoising action chunks conditioned on recent observations. It is one of the clearest sources for treating robot motor control as conditional generation over continuous control-input trajectories.

Method Notes

The model samples noisy future actions and iteratively denoises them into an executable action chunk.
Inference is used in a receding-horizon loop: generate a chunk, execute part of it, observe again, then regenerate.
The source compares CNN-conditioned and Transformer-based diffusion policies; diffusion is the action distribution model, while attention is one possible denoising-network architecture.

Evidence And Limitations

The paper reports consistent improvement over behavior-cloning baselines across simulation and real-world tasks, including robustness to some visual and physical perturbations. It also notes the central tradeoff: denoising improves multimodal continuous action modeling but raises inference latency relative to one-pass regression policies.

Foundation TSFM Relevance

Agenda slot	Verdict	Evidence	Missing pieces
Multi-modal future distributions	partially closes	Denoises future action chunks and explicitly targets multi-modal continuous action distributions.	The modeled distribution is over actions, not future observations or latent system states.
Control and counterfactuals	partially closes	Runs in a receding-horizon closed loop: observe, generate an action sequence, execute part of it, then replan.	It is imitation policy learning, not a learned world model that compares candidate intervention consequences.
Dynamic compute allocation	warning	Iterative denoising supports expressive action generation but adds latency relative to one-pass policies.	Needs acceleration or hybrid heads for high-rate control loops and digital operational systems.

Links Into The Wiki

Open Questions

Which latency-reduction methods preserve closed-loop robustness for high-rate contact tasks?
Should time-series foundation models borrow diffusion over future observation blocks, future control chunks, or both?

Alex Open Research Wiki

Explorer

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

Source

Core Claim

Method Notes

Evidence And Limitations

Foundation TSFM Relevance

Links Into The Wiki

Open Questions

Graph View

Table of Contents

Backlinks