NextLat
Summary
NextLat is the next-latent-prediction method introduced by Next-Latent Prediction Transformers Learn Compact World Models. It trains an autoregressive Transformer with ordinary next-token prediction plus an auxiliary objective that predicts the model’s own next hidden state.
The method matters because it turns a Transformer’s hidden state into an explicitly supervised latent transition object. The base Transformer and its ordinary autoregressive inference path remain unchanged, while a lightweight latent dynamics model is used during training and for optional self-speculative decoding.
Method Contract
- Base model: decoder-only autoregressive Transformer.
- Main objective: next-token cross-entropy.
- Auxiliary objective: predict from with a latent dynamics model.
- Target handling: detached hidden-state targets and stop-gradient choices are used to avoid collapse and reduce extra backward cost.
- Optional semantic alignment: KL matching between token distributions from true and predicted hidden states.
- Claimed latent semantics: optimized hidden states become belief states, i.e. compact sufficient statistics for predicting future observations.
- Serving hook: recursively rolling the latent dynamics model can draft variable-length continuations for self-speculative decoding.
flowchart LR Prefix[token history] --> Tr[Transformer] Tr --> Ht[h_t] Ht --> LM[next-token head] LM --> Xt[next token] Ht --> Psi[latent dynamics] Xt --> Psi Psi --> Hhat[h_hat_t+1] Hnext[h_t+1 target] -. detached .-> Loss[NextLat loss] Hhat --> Loss
Official Artifacts
- Preprint: arXiv 2511.05963
- OpenReview: PLAN-FM Bridge @ AAAI 2026
- Official blog: Next-Latent Prediction Transformers
- Official code: JaydenTeoh/NextLat
- Official X thread: Jayden Teoh announcement
- Local code README snapshot:
papers/nextlat-2026/github-readme-nextlat.md
The repository includes NextLat plus GPT, MTP, JTP, and BST baselines, training/evaluation scripts, configs, and data instructions. It does not by itself make the paper’s claims independently replicated.
Relevance To This Wiki
NextLat belongs on the latent-space predictive learning, JEPA-adjacent, and world-model branches. It is not a pure JEPA system because it keeps next-token prediction and uses the Transformer’s own hidden states as targets rather than a separate target encoder. It is also not a complete action-conditioned world model because the transition is over hidden state plus next token, not over typed external actions or interventions.
It should also be read as a close neighbor of Alex’s LeNEPA idea: LeNEPA asks whether NEPA-style next-embedding prediction plus LeJEPA-style distribution regularization should use external embeddings, own hidden states, or both. NextLat supplies the own-hidden-state side of that comparison.
For time-series and operational world-model work, the useful transfer is the pressure toward compact belief states and the evaluation lesson: next-observation accuracy is not enough. A TSFM analogue should check whether latent states preserve regimes, rare events, channel dependencies, exogenous variables, and action history, not only whether forecast loss improves.
Caveats
- Evidence is language and synthetic/sequence-world-model evidence, not numeric time-series evidence.
- The idealized theorem depends on successful optimization and does not remove empirical target/loss-design questions.
- The latent dynamics model is simple and underexplored.
- Self-speculative decoding is promising but currently evaluated with fixed draft-length sweeps rather than learned adaptive budgets.
- The official README records reproducibility caveats around
torch.compile(), Triton/Liger kernels, and hardware-specific throughput measurement.
Related Pages
- Next-Latent Prediction Transformers Learn Compact World Models
- Latent-Space Predictive Learning
- Joint Embedding Predictive Architecture
- Next-Embedding Prediction
- LeNEPA
- World Models
- Looped Transformers And Test-Time Memory
- Latent-State Time-Series Modeling
- Foundation Time-Series Model Research Agenda