ParaRNN

Summary

ParaRNN is Apple’s framework for training nonlinear recurrent neural networks in parallel by solving the hidden-state trajectory as a nonlinear system with Newton iterations and parallel reduction.

Role In The Wiki

ParaRNN anchors the nonlinear branch of efficient recurrent sequence models. Where Mamba-style SSMs preserve parallel training by keeping hidden-state updates linear, ParaRNN shows that adapted GRU and LSTM cells can be trained at billion-parameter language-model scale with parallelized nonlinear state updates.

Evidence

Relation To Foundation TSFM Agenda

Use the source-level agenda mapping in pararnn-2025 rather than duplicating verdict rows here.

At the entity level, ParaRNN anchors the nonlinear branch of efficient recurrent sequence models. Where Mamba-style SSMs preserve parallel training by keeping hidden-state updates linear, ParaRNN shows that adapted GRU and LSTM cells can be trained at billion-parameter language-model scale with parallelized nonlinear state updates. This page should stay as the object card; source pages carry slot-level verdicts, evidence, and missing pieces.

Overlap Notes

ParaRNN overlaps with Mamba on the serving goal of compact recurrent state, but differs by allowing nonlinear recurrent cells and paying for Newton-style hidden-trajectory solving. It overlaps with RMT only at the level of “state carried across sequence”; RMT exposes state as memory tokens, while ParaRNN keeps it as recurrent hidden dynamics.