Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Source
- Raw Markdown: paper_huginn-2025.md
- PDF: paper_huginn-2025.pdf
- Preprint: arXiv 2502.05171
- Official model: tomg-group-umd/huginn-0125
- Official code and data: seal-rg/recurrent-pretraining
Core Claim
Huginn scales a recurrent-depth language model that can spend additional test-time compute by iterating a recurrent block in latent space instead of emitting more reasoning tokens.
Relevance To This Wiki
This is a scale proof for looped Transformers after the original UT idea: the model can be pretrained at billions of parameters and use more loops at inference.
Limitations
Reasoning gains are language-benchmark evidence; they do not directly establish better state tracking for numeric time series.
Foundation TSFM Relevance
Adjacent to dynamic compute: loop count becomes a serving-time budget knob, potentially useful for uncertain windows or planning rollouts if transferred carefully.
Links Into The Wiki
- Huginn
- Looped Transformers And Test-Time Memory
- Efficient Recurrent Sequence Models
- Time-Series Scaling And Efficiency
- Efficient Parallel Samplers for Recurrent-Depth Models
- Foundation Time-Series Model Research Agenda
Open Questions
- What matched-budget baseline should this source be compared against: unique-depth Transformer layers, recurrent state, explicit memory, or extra inference steps?
- Which claims transfer from token-sequence reasoning to multivariate time-series state tracking, event streams, or action-conditioned world models?
- How much of the gain comes from recurrent depth versus data mixture, tokenization, prelude/coda design, or other training choices?