Sensorimotor World Models
Summary
Sensorimotor World Models (SMWM) is the inverse-dynamics-regularized JEPA world-model method introduced by Sensorimotor World Models: Perception for Action via Inverse Dynamics. It trains a pixel encoder and latent forward dynamics model end-to-end on offline, reward-free action trajectories, using an inverse dynamics head as the sole anti-collapse regularizer.
The key idea is that the representation should preserve state variables that explain action-conditioned transitions. That makes partial collapse of action-irrelevant variation a design feature for one agent, while also creating a caveat for broader foundation-model use.
Method Contract
- Input data: transition tuples
(o_t, a_t, o_{t+1})with image observations and continuous actions/control inputs. - State representation: an encoder maps each observation into a compact latent state.
- Forward model: predicts the next latent state from the current latent state and action.
- Inverse model: predicts the executed action from consecutive latent states.
- Anti-collapse signal: action recovery from latent transitions forces the encoder to preserve action-relevant information.
- Planning hook: frozen latent states and dynamics can be used with CEM/MPC to compare candidate action sequences.
flowchart LR Data["offline reward-free trajectories"] Enc["encoder"] Fwd["latent forward dynamics"] Inv["inverse dynamics regularizer"] Plan["CEM / MPC latent planner"] Data --> Enc --> Fwd --> Plan Enc --> Inv --> Enc
Official Artifacts
- Preprint: arXiv 2606.20104
- Official project page: petr-ivashkov.github.io/sensorimotor-world-model.github.io
- Official code: petr-ivashkov/sensorimotor-world-model
- Co-author X thread: Randall Balestriero thread
- Local code README snapshot:
papers/sensorimotor-world-models-2026/github_readme_snapshot.md - Local X thread summary:
papers/sensorimotor-world-models-2026/x_thread_randall_balestr_2072111835000590573_summary.json
The official repository includes a toy dot-world subproject and a planning subproject. The planning reproduction path requires Linux/CUDA dependencies and derives from LeWorldModel.
Relevance To This Wiki
SMWM belongs on the JEPA, Representation Collapse, Latent-Space Predictive Learning, and World Models branches. Its main contribution to the wiki is not that inverse dynamics is universally better than SIGReg. It adds a different regularization principle: use the action channel itself to decide which state variables a latent world model should keep.
For time-series and digital-world robot work, the transfer question is whether an analogous action/intervention-recovery objective can make multivariate models preserve controllable state, intervention effects, and useful latent geometry without erasing variables needed for safety, diagnostics, or downstream tasks.
Caveats
- Evidence is visual/robotic control evidence, not numeric time-series evidence.
- Action recoverability from observations is an assumption, not a guarantee.
- Partial collapse can be beneficial for one action repertoire but harmful for a general foundation model.
- Weak or incomplete action labels may require additional regularizers such as SIGReg, reconstruction grounding, or broader state probes.
- Offline dataset coverage and long-horizon rollout error remain ordinary world-model risks.