Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Source

Raw Markdown: paper_huginn-2025.md
PDF: paper_huginn-2025.pdf
Preprint: arXiv 2502.05171
Official model: tomg-group-umd/huginn-0125
Official code and data: seal-rg/recurrent-pretraining

Core Claim

Huginn scales a recurrent-depth language model that can spend additional test-time compute by iterating a recurrent block in latent space instead of emitting more reasoning tokens.

Relevance To This Wiki

This is a scale proof for looped Transformers after the original UT idea: the model can be pretrained at billions of parameters and use more loops at inference.

Limitations

Reasoning gains are language-benchmark evidence; they do not directly establish better state tracking for numeric time series.

Foundation TSFM Relevance

Adjacent to dynamic compute: loop count becomes a serving-time budget knob, potentially useful for uncertain windows or planning rollouts if transferred carefully.

Links Into The Wiki

Open Questions

What matched-budget baseline should this source be compared against: unique-depth Transformer layers, recurrent state, explicit memory, or extra inference steps?
Which claims transfer from token-sequence reasoning to multivariate time-series state tracking, event streams, or action-conditioned world models?
How much of the gain comes from recurrent depth versus data mixture, tokenization, prelude/coda design, or other training choices?

Alex Open Research Wiki

Explorer

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Source

Core Claim

Relevance To This Wiki

Limitations

Foundation TSFM Relevance

Links Into The Wiki

Open Questions

Graph View

Table of Contents

Backlinks