Fixed-Point Reasoners: Stable and Adaptive Deep Looped Transformers

Source

Raw Markdown: paper_fprm-2026.md
PDF: paper_fprm-2026.pdf
Preprint: arXiv 2606.18206
Workshop version: ICML 2026 AdaptFM OpenReview poster
Official code: nilskiKonjIzDunava/fprm
Official checkpoints: fixed-point-reasoners/fprm
Official X thread: Sajad Movahedi on FPRM
Gonzo ML discussion: Telegram post 5602
Review: ArXivIQ summary

Local social/context snapshots are stored as raw provenance under papers/fprm-2026/telegram-post-gonzo_ML-5602.md, papers/fprm-2026/x_thread_sajad_movahedi_2069070293696405680.md, and papers/fprm-2026/x_thread_sajad_movahedi_2069070293696405680.json.

Credibility And Status

This is a fresh 2026 source: arXiv v1 was submitted on 2026-06-16, and OpenReview lists it as a published ICML 2026 AdaptFM workshop poster. The author list connects ELLIS Institute Tübingen, Max Planck Institute for Intelligent Systems, ETH Zurich, Swiss Institute of Bioinformatics, Université Paris Cité, and Liquid AI. The GitHub repository describes itself as the official implementation, and the official Hugging Face organization publishes experiment checkpoints under an MIT-licensed model card.

Core Claim

FPRM is a single-loop Transformer reasoning model that uses convergence toward a fixed point as the halting signal. It replaces the post-norm convention in looped reasoning models with pre-norm plus learned layer-wise and iteration-wise residual scaling, then uses a damped fixed-point optimizer at inference to reduce oscillation around the fixed point.

What It Adds

Fixed-point halting: the model loops until the hidden-state residual is small, so the stopping rule is part of the latent dynamics instead of a separately trained ACT head.
Signal propagation fix: pre-norm improves gradient and representation flow through large effective depth; residual scaling recovers boundedness that post-norm previously supplied.
Non-hierarchical recursion: the reported 7M-parameter FPRM outperforms or matches recursive puzzle baselines on Sudoku-Extreme, Maze-Hard, ARC-AGI, and synthetic state tracking without HRM/TRM-style fast/slow hierarchy.
Released artifacts: the GitHub README links official Hugging Face checkpoints for Maze-Hard, Sudoku-Extreme, ARC-1, and ARC-2. This supersedes the Gonzo ML post’s initial Model: N/A field.

Relevance To This Wiki

FPRM is an important update to the looped-depth and recursive-reasoning branch. It narrows the design question from “do recursive models need a fast/slow hierarchy?” to “what state-stability, signal-propagation, and stopping-rule contracts make looped depth usable under a declared compute budget?”

For time-series and world-model work, the transferable mechanism is the convergence-based control interface: a latent-state update could spend more recurrent depth on hard windows, rare regimes, or action-conditioned transitions, then stop when the update stabilizes. The current evidence is still puzzle and algorithmic-reasoning evidence, so it should be treated as adjacent dynamic-compute machinery rather than direct TSFM evidence.

Foundation TSFM Relevance

Agenda slot	Verdict	Evidence	Missing pieces
Dynamic compute / fixed-FLOPs hierarchy	adjacent	Fixed-point residuals provide an input-dependent halting signal for looped latent computation.	Needs matched wall-clock, memory-bandwidth, and expected-FLOPs comparisons against unique-depth, wider, recurrent-state, and explicit-memory baselines.
Latent-state modeling	adjacent	The model treats hidden-state convergence as useful computation rather than only a diagnostic.	No multivariate time-series, event-stream, or action-conditioned state-transition benchmarks.
Benchmark hygiene	warning	Strong puzzle and ARC-style results are useful but concentrated in fully observed symbolic domains.	Needs no-loop/no-fixed-point ablations, calibrated stopping tests, and transfer to numeric or operational trajectories before TSFM claims.

Limitations

The evidence is centered on Sudoku, Maze, ARC-AGI, and synthetic state-tracking tasks, not continuous numeric time series or action-conditioned world models.
The fixed-point residual is a halting statistic, not automatically calibrated uncertainty; downstream systems would need calibration and failure-detection probes.
The paper’s hierarchy critique should not be overread: FPRM shows that HRM/TRM-style hierarchy is not necessary for the reported benchmarks, but it does not prove that hierarchy is unnecessary for long-horizon, partially observed, multimodal, or control-heavy settings.
Reported test-time compute can reach very high effective depth, so practical serving claims need realized latency, batching, kernel, memory, and early-exit accounting rather than only nominal loop count.

Links Into The Wiki

Open Questions

Can fixed-point residuals become calibrated uncertainty or anomaly signals for time-series windows, or are they only optimization diagnostics?
Does pre-norm plus residual scaling still stabilize looped depth when the latent state is driven by noisy observations, missing data, exogenous variables, and control inputs?
Under a fixed serving budget, when is fixed-point looping better than a unique-depth Transformer, compact recurrent state, explicit memory tokens, depth-KV retrieval, or parallel latent trajectories?
Can the fixed-point halting criterion avoid shortcut collapse on tasks where multiple candidate futures or hidden regimes must remain alive until late evidence arrives?

Alex Open Research Wiki

Explorer

Fixed-Point Reasoners: Stable and Adaptive Deep Looped Transformers

Fixed-Point Reasoners: Stable and Adaptive Deep Looped Transformers

Source

Credibility And Status

Core Claim

What It Adds

Relevance To This Wiki

Foundation TSFM Relevance

Limitations

Links Into The Wiki

Open Questions

Graph View

Table of Contents

Backlinks