Next-Embedding Prediction

Summary

Next-embedding prediction trains a sequence model to predict future embeddings rather than reconstructing raw observations. It sits between reconstruction-style modeling and JEPA: the target is already in representation space, but the basic recipe can be simpler than a full joint-embedding setup with separate context and target encoders.

For time series, the important design question is where the target embedding comes from. If the target embedding is too local, it may miss useful context. If it is already too contextual, the predictor may learn an easier target that has mixed away patch-level state. LeNEPA is now the local MILETS 2026 evidence that a no-augmentation next-latent recipe can work on time-series representation learning when temporal SIGReg replaces stop-gradient/EMA stabilization.

The local LeNEPA direction should sit on this page: it asks whether NEPA-style next-embedding prediction can be combined with LeJEPA/SIGReg-style distribution regularization and compared against NextLat-style own-hidden-state targets.

What The Wiki Currently Believes

NEPA introduces next-embedding predictive autoregression for visual SSL: embed patches, then predict future patch embeddings without reconstructing pixels, discrete tokens, contrastive pairs, or task-specific heads.
LeNEPA adapts the next-embedding idea to time-series representation learning with a causal backbone, no augmentations, no stop-gradient target, temporal SIGReg, and frozen-probe evaluation on PTB-XL, Aionoscope Diag, and UCR-128.
VISReg is not a next-embedding method, but it is a close regularizer-side neighbor for LeNEPA because it tests whether SIGReg-family distribution control should be split into scale and shape losses.
EIDOS adapts the next-embedding idea to time-series forecasting with point-wise scalar embeddings, stop-gradient on the target branch, and observation-space grounding.
LeWorldModel uses next-embedding prediction inside a JEPA-style world model, adding Gaussian regularization to stabilize end-to-end latent prediction from pixels.
NextLat is the closest new neighbor: it predicts the model’s own next hidden state rather than an external patch embedding, so it should become a LeNEPA baseline or target-family ablation.
LeNEPA now tracks follow-up design questions after the MILETS 2026 source: compare external embeddings, contextual targets, own-hidden targets, and hybrid targets while preserving dense state and rare events.
The local dynamic-curriculum notes treat next-embedding prediction as a useful diagnostic for target-layer choice before applying surprise-based sampling to JEPA-style training.

Relation To JEPA

NEPA should stay close to JEPA, but not be collapsed into it.

The overlap is the latent target: both avoid raw reconstruction as the main prediction target. The difference is the recipe. NEPA starts from an embedding layer and predicts the next embedding. JEPA is the broader joint-embedding family, where context and target views are encoded and the predictor learns to match target representations, often with additional anti-collapse or distribution-shaping constraints.

That means a NEPA failure mode can warn JEPA design, but it is not automatically JEPA evidence. When a JEPA curriculum uses latent prediction surprise as a sampling signal, it should still ablate how the target representation is built.

LeNEPA is the bridge hypothesis between the two pages: keep NEPA close to the embedding-stream target, but import JEPA/LeJEPA regularization so the target distribution is controlled rather than only stabilized by stop-gradient. NextLat adds a second bridge: own-hidden-state targets may be more belief-state-like than external embeddings, but must be checked for dense-value and rare-event preservation.

Time-Series Target-Layer Note

In a NEPA-style setup, a CNN-style embedder encodes each patch independently. That baseline trains well.

Two changes make the setup fragile:

replacing the independent CNN patch embedder with a more sequence-dependent path, such as a Mamba/H-Net-like dynamic patching path;
predicting an internal Transformer layer instead of the initial embedding layer.

The dynamic-patching change gives mixed results: one case works, another does not. The internal-layer target is worse in the reported setup: moving from the embedding layer to an internal Transformer layer cuts quality roughly in half.

The practical conclusion is simple: this NEPA setup works best when the target embedding is still patch-independent. Once the target already mixes information across patches, the prediction problem changes and quality can collapse.

For JEPA, this is a design warning, not a direct result. Surprise-controlled curricula should check whether the target encoder introduces cross-patch dependence before using latent prediction loss as the value signal for selecting windows.

Relation To Foundation TSFM Agenda

Agenda slot	Verdict	Evidence	Missing pieces
Latent-state prediction	partially closes	NEPA and EIDOS predict embeddings rather than raw observations, and LeNEPA adds direct time-series fixed-recipe evidence for next-latent prediction.	Need high-dimensional streaming time-series tests where the latent tracks regime, state, rare events, and action history.
Representation quality	warning	The local time-series note shows target-layer choice can dominate whether next-embedding prediction preserves patch-level state.	Run public ablations over independent patch targets, contextual targets, and internal-layer targets.
Anti-collapse regularization	partially closes	NEPA relies on stop-gradient, while LeNEPA shows temporal SIGReg can stabilize a time-series next-latent recipe without stop-gradient in the tested setting; VISReg suggests a stronger scale/shape regularizer variant for future LeNEPA ablations.	Test whether SIGReg/VISReg/Gaussian regularization transfers without losing rare or local state across broader domains.
Data diversity, curriculum, and long tail	adjacent	Target quality affects whether latent prediction surprise is a useful sampler signal for long-tailed temporal corpora.	Validate surprise-based curricula with target-layer ablations and rare-state metrics.

Open Questions

Which target layer is best for next-embedding prediction on numeric time series: independent patch embeddings, point-wise embeddings, contextual embeddings, or internal Transformer layers?
Can a contextual target be made useful without erasing patch-level state?
Does observation-space grounding, as in EIDOS, prevent the target from drifting away from dense numeric detail?
LeNEPA partially answers SIGReg-versus-stop-gradient for regular PTB-XL/Diag fixed-recipe SSL; does temporal SIGReg remain stable and state-preserving across contextual targets, own-hidden targets, irregular/event-stream data, and action-conditioned settings?
Can LeNEPA combine SIGReg-style distribution control with NEPA-style target construction without losing dense numeric detail beyond the published fixed-recipe setting?
Should a LeNEPA experiment use external next embeddings, NextLat-style own hidden states, or both as matched target families?
When is next-embedding prediction enough, and when does the model need a fuller JEPA-style context and target interface?

Alex Open Research Wiki

Explorer

Next-Embedding Prediction

Next-Embedding Prediction

Summary

What The Wiki Currently Believes

Relation To JEPA

Time-Series Target-Layer Note

Relation To Foundation TSFM Agenda

Open Questions

Graph View

Table of Contents

Backlinks

Alex Open Research Wiki

Explorer

Next-Embedding Prediction

Next-Embedding Prediction

Summary

What The Wiki Currently Believes

Relation To JEPA

Time-Series Target-Layer Note

Relation To Foundation TSFM Agenda

Open Questions

Related Pages

Graph View

Table of Contents

Backlinks