LeJEPA: Provable And Scalable Self-Supervised Learning Without The Heuristics
Source
- Raw Markdown: paper_lejepa-2025.md
- PDF: paper_lejepa-2025.pdf
Core Claim
LeJEPA argues that JEPA embeddings should follow an isotropic Gaussian distribution and introduces SIGReg to enforce that distribution efficiently.
Key Contributions
- Provides a theory for the optimal embedding distribution for downstream prediction risk.
- Introduces Sketched Isotropic Gaussian Regularization (SIGReg).
- Combines JEPA predictive loss with SIGReg to reduce reliance on stop-gradient, EMA, teacher-student, and scheduler heuristics.
- Validates across many datasets, architectures, and domains.
Method Notes
LeJEPA is central to JEPA, Representation Collapse, and Self-Supervised Representation Learning.
Evidence And Results
The source reports broad empirical validation, stable training across architectures and domains, and ImageNet-1k linear evaluation examples for large ViT models.
Limitations
The paper’s strongest claim is generality. The wiki should test that claim against multimodal and control-specific sources such as VL-JEPA and LeWorldModel.
Foundation TSFM Relevance
| Agenda slot | Verdict | Evidence | Missing pieces |
|---|---|---|---|
| Anti-collapse regularization | partially closes | Combines JEPA prediction with SIGReg toward isotropic Gaussian embeddings to avoid complete and dimensional collapse. | Evidence is outside time series and does not test rare regimes or cross-channel deviations. |
| Representation quality | adjacent | Gives a principled target distribution for representations optimized for downstream prediction. | Does not show preservation of dense numeric detail for forecasting, generation, or editing. |
| Augmentation-free self-supervision | adjacent | JEPA-style latent prediction reduces reliance on handcrafted positive/negative augmentation pairs. | Needs time-series objectives that respect irregular sampling, channel identity, and event semantics. |
Links Into The Wiki
- LeJEPA
- Foundation Time-Series Model Research Agenda
- JEPA
- Representation Collapse
- Self-Supervised Representation Learning
Open Questions
- Can SIGReg remain sufficient at frontier multimodal scale?
- Is the isotropic Gaussian target universally optimal or domain-dependent?