VL-JEPA

Summary

VL-JEPA is a vision-language model that predicts continuous target-text embeddings instead of autoregressively generating text tokens.

Role In The Wiki

VL-JEPA extends JEPA-style representation prediction to general-domain vision-language tasks and selective decoding. It anchors the wiki pattern where language is a readout from a continuous semantic embedding stream, not necessarily the system’s main internal representation.

Evidence

Relation To Foundation TSFM Agenda

Use the source-level agenda mapping in vl-jepa-2025 rather than duplicating verdict rows here.

At the entity level, VL-JEPA extends JEPA-style representation prediction to general-domain vision-language tasks and selective decoding. It anchors the wiki pattern where language is a readout from a continuous semantic embedding stream, not necessarily the system’s main internal representation. This page should stay as the object card; source pages carry slot-level verdicts, evidence, and missing pieces.