LeNEPA

Status: draft research idea extracted from internal discussion notes.

Collaboration

If this direction resonates with you, I would be happy to talk with like-minded people, collaborate on research, and work on use-cases together.

Ideas are not the bottleneck. Hands are. Time-series modeling should be moving at least as fast as vision, audio, and robotics.

Summary

LeNEPA is the local shorthand for a LeJEPA-regularized next-embedding-prediction direction: keep NEPA’s simple representation-space prediction interface, but add LeJEPA-style distribution regularization so the latent target is predictable without collapsing or drifting into nuisance factors.

The idea should live close to Next-Embedding Prediction, LeWorldModel, EIDOS, and NextLat. NextLat is especially relevant because it shows that supervising the model’s own next hidden state can pressure a Transformer toward compact belief state. LeNEPA should treat that as a nearby design point, not as a replacement for NEPA: the open design choice is whether the target should be an external/point-wise embedding, a contextual target embedding, the model’s own hidden state, or a hybrid.

Placement

NeighborShared ideaDifference from LeNEPA
NEPAPredict future embeddings instead of raw pixels or tokens.Original NEPA is vision-only, relies on stop-gradient, and does not use LeJEPA/SIGReg-style target distribution control.
LeJEPAUse an isotropic Gaussian target distribution / SIGReg to avoid collapse and dimensional degeneration.LeJEPA is the broader JEPA regularization claim; LeNEPA would specialize that prior to next-embedding or next-state autoregression.
LeWorldModelCombine next-embedding prediction, Gaussian regularization, and latent world-model use.LeWorldModel is pixel-control evidence with explicit actions; LeNEPA should be tested for time-series/event streams and typed control inputs.
EIDOSTime-series next-embedding prediction with observation-space grounding.EIDOS uses stop-gradient plus grounding; LeNEPA asks whether distribution regularization can reduce heuristics while preserving dense numeric state.
NextLatPredict a future latent/hidden state and evaluate whether next-observation accuracy hides weak internal maps.NextLat keeps next-token training and predicts the Transformer’s own next hidden state from the current hidden state plus next token; LeNEPA should compare own-hidden targets against NEPA-style embedding targets.

Interface Sketch

flowchart LR
  X[time-series window / event stream] --> Tok[tokenizer or embedder]
  Tok --> H[current latent state]
  U[event, exogenous variable, action, control input, or intervention] --> Pred[predictor]
  H --> Pred
  Pred --> Zhat[predicted next embedding/state]
  Target[next embedding or hidden state target] -. stop-gradient or online target .-> Align[latent alignment]
  Zhat --> Align
  Zhat --> Reg[SIGReg / Gaussian distribution regularizer]
  Zhat --> Ground[optional observation grounding]

The core decision is target construction. A safe first implementation SHOULD compare at least three targets under matched compute:

  1. Patch- or point-wise embedding target: closest to NEPA and EIDOS; likely preserves local numeric state better.
  2. Contextual embedding target: may encode useful state, but can mix away patch-level detail.
  3. Own-hidden target: closest to NextLat; may create compact belief state, but needs probes to ensure dense numeric detail and rare events survive.

Hypotheses

  • LeNEPA can make NEPA-style next-embedding prediction less dependent on stop-gradient and teacher/student heuristics by using a target distribution prior such as SIGReg.
  • NextLat suggests that own-hidden-state prediction is a strong baseline for LeNEPA. A LeNEPA experiment SHOULD include a NextLat-style own-hidden target, not only external embeddings.
  • Target-layer choice will dominate outcomes. The existing NEPA topic already warns that contextual or internal-layer targets can degrade quality; LeNEPA should treat target-layer ablation as a first-class result, not an appendix.
  • Time-series LeNEPA needs observation grounding or another dense-value preservation check. A Gaussian latent that predicts well can still erase rare spikes, cross-channel deviations, event timing, or action history.
  • For action-conditioned settings, the transition input SHOULD be a typed action, control input, intervention, event, treatment, or exogenous variable, not an ambiguous “action” token.

First Experiment Shape

A practical first public experiment could be:

  1. Start from an EIDOS-like point-wise tokenizer for univariate and then multivariate time series.
  2. Train matched models with NEPA/EIDOS-style stop-gradient, LeNEPA-style SIGReg, NextLat-style own-hidden supervision, and hybrid targets.
  3. Evaluate not only forecast loss, but also latent probes for regime, event timing, channel relationships, rare-state retention, and dense numeric recoverability.
  4. Add a small action- or intervention-conditioned environment only after the passive target-layer comparison is stable.
  5. Report wall-clock and memory cost, because distribution regularization and extra target heads must beat simple forecasting or deeper-backbone baselines under matched serving constraints.

Relation To Foundation TSFM Agenda

This is an idea page, so the verdicts below describe the intended contribution if the proposed system works. Evidence status is recorded separately in the Evidence and Missing pieces columns.

Agenda slotVerdictEvidenceMissing pieces
Latent-state predictionpartially closesProposes next-embedding or next-hidden-state supervision with explicit target-distribution control. Evidence is an internal design note grounded in NEPA, LeJEPA, EIDOS, LeWorldModel, and NextLat.Run matched target-layer and own-hidden-state ablations on time-series data.
Anti-collapse regularizationpartially closesLeJEPA/SIGReg motivates distribution regularization as an alternative to brittle stop-gradient heuristics.Show that regularization prevents collapse without erasing rare regimes or dense numeric values.
Representation qualitypartially closesKeeps prediction in representation space while requiring dense-value preservation checks.Need probes for regime, cross-channel state, exogenous variables, events, interventions, and recoverability.
Control and counterfactualsadjacentLeWorldModel and NextLat motivate latent transition interfaces, but LeNEPA still needs explicit typed action/control/intervention inputs.Add candidate-action rollout or intervention benchmarks after passive state learning works.
Benchmark levelwarningNextLat shows next-token legality can hide poor internal maps; NEPA warns target-layer choice can dominate results.Define benchmark diagnostics before claiming a learned world model.

Open Questions

  • Is LeNEPA best defined as SIGReg-regularized NEPA, NextLat-style own-hidden prediction with a Gaussian prior, or a target-family comparison that includes both?
  • Which target path is safest for time series: point-wise embeddings, independent patch embeddings, contextual embeddings, internal Transformer layers, or own hidden states?
  • Can SIGReg replace stop-gradient in a time-series next-embedding objective without losing dense numeric detail?
  • Does own-hidden-state prediction improve belief-state quality on multivariate time series, or does it mostly optimize a self-consistency shortcut?
  • How should LeNEPA incorporate typed actions, control inputs, interventions, events, treatments, and exogenous variables?
  • Which metrics distinguish a compact useful belief state from a compressed latent that merely improves average forecast loss?