VISReg
Summary
VISReg is the Variance-Invariance-Sketching Regularization method introduced by VISReg: Variance-Invariance-Sketching Regularization for JEPA training. It is a visual self-supervised learning regularizer that keeps the LeJEPA/SIGReg goal of matching embeddings to an isotropic Gaussian-like distribution, but decouples regularization into variance/scale, Sliced-Wasserstein shape matching, and center losses.
The method matters to this wiki because it is a current pressure test of the SIGReg branch used by LeNEPA: VISReg argues that SIGReg-type regularization can scale, while also claiming that vanilla SIGReg has weak gradients when embeddings have already collapsed.
Method Contract
- Input surface: augmented image views in a visual SSL recipe.
- Backbones: released ViT-B/16 and ViT-L/14 checkpoints; the paper also analyzes smaller ImageNette stress tests.
- Objective family: LeJEPA-style multi-view invariance/prediction plus explicit embedding regularization.
- Regularizer: scale/variance loss, Sliced-Wasserstein shape loss over random 1D projections, and center loss.
- Target distribution: isotropic Gaussian-like embedding geometry, inherited from the LeJEPA/SIGReg line but implemented through quantile-matching slices.
- Evidence: collapse-gradient simulation, low-quality-data stress tests, ImageNet-1K and ImageNet-22K pretraining, OOD image classification, transfer learning, segmentation, and generation guidance.
- Boundary: no numeric time-series, event-stream, action, control-input, intervention, or counterfactual rollout evidence.
Official Artifacts
- Preprint: https://arxiv.org/abs/2606.02572
- DOI: https://doi.org/10.48550/arXiv.2606.02572
- Project page: https://haiyuwu.github.io/visreg/
- Official code: https://github.com/HaiyuWu/visreg
- Official Hugging Face checkpoints: https://huggingface.co/BooBooWu/visreg
- Raw paper Markdown: paper_visreg-2026.md
The repository and Hugging Face model card list ViT-B/16 and ViT-L/14 ImageNet-1K checkpoints and state CC BY-NC 4.0 terms for code and pretrained weights.
Relation To LeNEPA
VISReg should be tracked as a possible next regularizer variant for LeNEPA experiments, not as direct time-series evidence.
A LeNEPA follow-up can treat VISReg as a drop-in question:
flowchart LR X[time-series window / event stream] --> Tok[token or patch embeddings] Tok --> Pred[next-latent prediction] Pred --> Align[next-latent alignment] Pred --> TSIG[temporal SIGReg baseline] Pred --> TVIS[temporal VISReg variant: scale + SWD shape + center] TSIG --> Probe[dense-state / rare-regime probes] TVIS --> Probe
The comparison should ask whether temporal VISReg improves collapse recovery or scaling without erasing dense numeric state, rare events, channel relationships, exogenous variables, or typed action/control-input histories.
Caveats
- VISReg is a vision SSL method; it is only adjacent to foundation time-series modeling.
- The source is an arXiv preprint with official artifacts, not yet a peer-reviewed venue record.
- Aggregate OOD classification does not prove latent-state preservation.
- The method still has objective-level choices: projector dimension, slice count, batch composition, multi-view augmentation, and loss weights.
- VISReg contains a principled stop-gradient on the scale normalization path; the wiki should reserve “no stop-gradient” wording for the narrower claim that it avoids teacher/student or stop-gradient target heuristics.