Sensorimotor World Models

Summary

Sensorimotor World Models (SMWM) is the inverse-dynamics-regularized JEPA world-model method introduced by Sensorimotor World Models: Perception for Action via Inverse Dynamics. It trains a pixel encoder and latent forward dynamics model end-to-end on offline, reward-free action trajectories, using an inverse dynamics head as the sole anti-collapse regularizer.

The key idea is that the representation should preserve state variables that explain action-conditioned transitions. That makes partial collapse of action-irrelevant variation a design feature for one agent, while also creating a caveat for broader foundation-model use.

Method Contract

Input data: transition tuples (o_t, a_t, o_{t+1}) with image observations and continuous actions/control inputs.
State representation: an encoder maps each observation into a compact latent state.
Forward model: predicts the next latent state from the current latent state and action.
Inverse model: predicts the executed action from consecutive latent states.
Anti-collapse signal: action recovery from latent transitions forces the encoder to preserve action-relevant information.
Planning hook: frozen latent states and dynamics can be used with CEM/MPC to compare candidate action sequences.

flowchart LR
  Data["offline reward-free trajectories"]
  Enc["encoder"]
  Fwd["latent forward dynamics"]
  Inv["inverse dynamics regularizer"]
  Plan["CEM / MPC latent planner"]

  Data --> Enc --> Fwd --> Plan
  Enc --> Inv --> Enc

Official Artifacts

Preprint: arXiv 2606.20104
Official project page: petr-ivashkov.github.io/sensorimotor-world-model.github.io
Official code: petr-ivashkov/sensorimotor-world-model
Co-author X thread: Randall Balestriero thread
Local code README snapshot: papers/sensorimotor-world-models-2026/github_readme_snapshot.md
Local X thread summary: papers/sensorimotor-world-models-2026/x_thread_randall_balestr_2072111835000590573_summary.json

The official repository includes a toy dot-world subproject and a planning subproject. The planning reproduction path requires Linux/CUDA dependencies and derives from LeWorldModel.

Relevance To This Wiki

SMWM belongs on the JEPA, Representation Collapse, Latent-Space Predictive Learning, and World Models branches. Its main contribution to the wiki is not that inverse dynamics is universally better than SIGReg. It adds a different regularization principle: use the action channel itself to decide which state variables a latent world model should keep.

For time-series and digital-world robot work, the transfer question is whether an analogous action/intervention-recovery objective can make multivariate models preserve controllable state, intervention effects, and useful latent geometry without erasing variables needed for safety, diagnostics, or downstream tasks.

Caveats

Evidence is visual/robotic control evidence, not numeric time-series evidence.
Action recoverability from observations is an assumption, not a guarantee.
Partial collapse can be beneficial for one action repertoire but harmful for a general foundation model.
Weak or incomplete action labels may require additional regularizers such as SIGReg, reconstruction grounding, or broader state probes.
Offline dataset coverage and long-horizon rollout error remain ordinary world-model risks.

Alex Open Research Wiki

Explorer

Sensorimotor World Models

Sensorimotor World Models

Summary

Method Contract

Official Artifacts

Relevance To This Wiki

Caveats

Graph View

Table of Contents

Backlinks

Alex Open Research Wiki

Explorer

Sensorimotor World Models

Sensorimotor World Models

Summary

Method Contract

Official Artifacts

Relevance To This Wiki

Caveats

Related Pages

Graph View

Table of Contents

Backlinks