Supervised Memory Training

Summary

Supervised Memory Training (SMT) is a pretraining method for nonlinear RNNs that replaces recurrent credit propagation with supervised one-step memory-transition labels. A Transformer encoder-decoder first learns predictive memory states for each context; the RNN updater then learns $(m_{t}, x_{t + 1}) \mapsto m_{t + 1}$ . DAgger Memory Training (DMT) is the follow-up on-policy imitation phase that reduces rollout drift from SMT-only one-step training.

Role In The Wiki

SMT belongs to the nonlinear recurrent-state branch of Efficient Recurrent Sequence Models. It complements ParaRNN: ParaRNN parallelizes the actual nonlinear recurrent trajectory with Newton iterations and parallel reduction, while SMT changes the training target into predictive-state imitation.

For time-series and world-model work, the interesting transfer is the predictive memory interface: learn a compact latent state that is sufficient for future prediction, then train a recurrent updater to maintain that state. The caveat is that Transformer-teacher expressivity, rollout drift, and reward/control objectives remain unsolved.

Evidence

Pretraining Recurrent Networks without Recurrence

Alex Open Research Wiki

Explorer

Supervised Memory Training

Supervised Memory Training

Summary

Role In The Wiki

Evidence

Graph View

Table of Contents

Backlinks

Alex Open Research Wiki

Explorer

Supervised Memory Training

Supervised Memory Training

Summary

Role In The Wiki

Evidence

Related Pages

Graph View

Table of Contents

Backlinks