VLA-JEPA

Summary

VLA-JEPA is a vision-language-action model that uses leakage-free JEPA-style latent state prediction to pretrain latent-action representations, then uses a flow-matching action head for continuous robot control-input trajectories.

Role In The Wiki

VLA-JEPA anchors the robotics branch where JEPA-style latent prediction is tied directly to VLA policy pretraining. It is stronger robotics evidence than VL-JEPA for latent-action-conditioned VLA pretraining, but not evidence for a planner-style action-conditioned simulator with typed interventions.

Evidence

Official Artifacts

Relation To Foundation TSFM Agenda

Use the source-level agenda mapping in vla-jepa-2026 rather than duplicating verdict rows here.

At the entity level, VLA-JEPA is the named model object for JEPA-style VLA pretraining with latent-action tokens, a V-JEPA2 target encoder, and a flow-matching action head. This page should stay as the object card; source pages carry slot-level verdicts, evidence, and missing pieces.