Energy-Based Models

Summary

Energy-based models appear here as the probabilistic alternative favored for high-dimensional continuous data where explicit normalized probability models become awkward. The newer EBT source adds a concrete modern scaling attempt: use a Transformer as the energy function, then generate by optimizing candidate predictions under the learned energy.

What The Wiki Currently Believes

Evidence

The source set is aligned: EBMs are less a standalone recipe than a substrate for predictive representations, uncertainty, verification, search, and world models. The LeCun/LVEBM sources provide the conceptual and latent-variable frame; EBT provides an empirical attempt to make explicit EBMs scale with Transformer backbones.

EBT also sharpens the practical EBM caveat: explicit energies expose verification and search, but candidate-optimization cost, step-size sensitivity, and many-mode energy landscapes remain first-class serving and modeling risks.

Recurrent-Loop Caveat

The local JEPA-curriculum discussion raised a useful design warning for EBT-style systems. It is possible to add recurrent Transformer blocks inside an energy-based predictor, but that does not make the design useful by itself. Energy-based inference already has an iterative loop: score a candidate, follow the energy gradient or search procedure, and repeat.

So the agenda question should be specific. Is the bottleneck the energy landscape, candidate generation, stopping rule, memory budget, or test-time compute allocation? If the existing energy loop is already the right place to spend adaptive compute, adding another recurrent block may only make optimization and gradients harder.

Relation To Foundation TSFM Agenda

Energy-based models are adjacent to the Foundation Time-Series Model Research Agenda through multi-modal future distributions, latent-state scoring, and dynamic compute or search. EBT strengthens the dynamic-compute branch by showing a per-prediction optimization interface outside time series. The current page still does not provide time-series-specific evidence for calibrated futures, editing, or action-conditioned rollout.

Open Questions

  • Which regularized EBM training methods scale best for multimodal world models?
  • How directly should current JEPA objectives be interpreted as energy-based objectives?
  • Can EBT-style explicit energies represent multiple plausible time-series futures without averaging nearby modes into one low-energy basin?
  • Should an EBT serve as the main dynamics model, or as a slow verifier above a cheaper feed-forward or recurrent state model?
  • When should adaptive compute live in the energy-search loop rather than in recurrent blocks inside the predictor?