VLA-JEPA
Summary
VLA-JEPA is a vision-language-action model that uses leakage-free JEPA-style latent state prediction to pretrain latent-action representations, then uses a flow-matching action head for continuous robot control-input trajectories.
Role In The Wiki
VLA-JEPA anchors the robotics branch where JEPA-style latent prediction is tied directly to VLA policy pretraining. It is stronger robotics evidence than VL-JEPA for latent-action-conditioned VLA pretraining, but not evidence for a planner-style action-conditioned simulator with typed interventions.
Evidence
Official Artifacts
Relation To Foundation TSFM Agenda
Use the source-level agenda mapping in vla-jepa-2026 rather than duplicating verdict rows here.
At the entity level, VLA-JEPA is the named model object for JEPA-style VLA pretraining with latent-action tokens, a V-JEPA2 target encoder, and a flow-matching action head. This page should stay as the object card; source pages carry slot-level verdicts, evidence, and missing pieces.