VISReg

Summary

VISReg is the Variance-Invariance-Sketching Regularization method introduced by VISReg: Variance-Invariance-Sketching Regularization for JEPA training. It is a visual self-supervised learning regularizer that keeps the LeJEPA/SIGReg goal of matching embeddings to an isotropic Gaussian-like distribution, but decouples regularization into variance/scale, Sliced-Wasserstein shape matching, and center losses.

The method matters to this wiki because it is a current pressure test of the SIGReg branch used by LeNEPA: VISReg argues that SIGReg-type regularization can scale, while also claiming that vanilla SIGReg has weak gradients when embeddings have already collapsed.

Method Contract

  • Input surface: augmented image views in a visual SSL recipe.
  • Backbones: released ViT-B/16 and ViT-L/14 checkpoints; the paper also analyzes smaller ImageNette stress tests.
  • Objective family: LeJEPA-style multi-view invariance/prediction plus explicit embedding regularization.
  • Regularizer: scale/variance loss, Sliced-Wasserstein shape loss over random 1D projections, and center loss.
  • Target distribution: isotropic Gaussian-like embedding geometry, inherited from the LeJEPA/SIGReg line but implemented through quantile-matching slices.
  • Evidence: collapse-gradient simulation, low-quality-data stress tests, ImageNet-1K and ImageNet-22K pretraining, OOD image classification, transfer learning, segmentation, and generation guidance.
  • Boundary: no numeric time-series, event-stream, action, control-input, intervention, or counterfactual rollout evidence.

Official Artifacts

The repository and Hugging Face model card list ViT-B/16 and ViT-L/14 ImageNet-1K checkpoints and state CC BY-NC 4.0 terms for code and pretrained weights.

Relation To LeNEPA

VISReg should be tracked as a possible next regularizer variant for LeNEPA experiments, not as direct time-series evidence.

A LeNEPA follow-up can treat VISReg as a drop-in question:

flowchart LR
  X[time-series window / event stream] --> Tok[token or patch embeddings]
  Tok --> Pred[next-latent prediction]
  Pred --> Align[next-latent alignment]
  Pred --> TSIG[temporal SIGReg baseline]
  Pred --> TVIS[temporal VISReg variant: scale + SWD shape + center]
  TSIG --> Probe[dense-state / rare-regime probes]
  TVIS --> Probe

The comparison should ask whether temporal VISReg improves collapse recovery or scaling without erasing dense numeric state, rare events, channel relationships, exogenous variables, or typed action/control-input histories.

Caveats

  • VISReg is a vision SSL method; it is only adjacent to foundation time-series modeling.
  • The source is an arXiv preprint with official artifacts, not yet a peer-reviewed venue record.
  • Aggregate OOD classification does not prove latent-state preservation.
  • The method still has objective-level choices: projector dimension, slice count, batch composition, multi-view augmentation, and loss weights.
  • VISReg contains a principled stop-gradient on the scale normalization path; the wiki should reserve “no stop-gradient” wording for the narrower claim that it avoids teacher/student or stop-gradient target heuristics.