Convergent Evolution: How Different Language Models Learn Similar Number Representations
Source
- Raw Markdown: paper_convergent-evolution-number-representations-2026.md
- PDF: paper_convergent-evolution-number-representations-2026.pdf
- Preprint: arXiv 2604.20817
- Model collection: Hugging Face collection
- Gonzo ML discussion: post 5315
- Review: ArxivIQ note
Status And Credibility
Recent April 2026 arXiv preprint from a credible academic author team. Treat as important current evidence for number-representation diagnostics, with the usual preprint caveat until peer-review status is known.
Core Claim
Many language models and even raw number-token frequencies show Fourier spikes at periods such as , but those spikes do not guarantee useful modular number representations. The paper separates spectral convergence, where embeddings have periodic Fourier power, from geometric convergence, where residue classes such as are linearly separable.
Key Contributions
- Shows Fourier spikes across Transformers, non-Transformer LMs, classical word embeddings, and raw number-token frequency distributions.
- Proves that Fourier-domain sparsity is necessary but not sufficient for mod- geometric separability.
- Uses controlled 300M-parameter pretraining experiments to test the roles of data, architecture, optimizer, tokenizer, and context.
- Shows two routes to geometric convergence: language co-occurrence structure and multi-token addition tasks that force modular subproblems.
- Shows single-token addition can leave representations seed- and optimizer-dependent because it does not impose the same modular pressure.
Why It Matters For Number Tokenization
This source is a guardrail for Fourier-number enthusiasm. FoNE intentionally builds Fourier number embeddings; this paper shows why a visible Fourier spectrum alone is not evidence that a model has learned functional numeracy.
For time-series and numeric-feature work, the lesson is broader: representation diagnostics should test usable geometry, not only visible basis structure. A periodic basis can be present because of token frequencies or co-occurrence artifacts while still failing the downstream operation that motivated the basis.
Limitations
The evidence concerns text-number token embeddings and controlled arithmetic training. It does not directly evaluate scalar sensor values, units, missingness, uncertainty, exogenous numeric variables, or action/control intensities in time-series foundation models.
The paper is strongest as a diagnostic and attribution source, not as a direct proposal for a new numeric encoding.
Foundation TSFM Relevance
| Agenda slot | Verdict | Evidence | Missing pieces |
|---|---|---|---|
| Number tokenization | warning | Fourier spikes can be universal but non-functional; mod- probes test geometry more directly. | Need TSFM-specific probes over scalar values, units, regimes, and control inputs. |
| Representation quality | adjacent | Distinguishes spectral structure from linearly usable modular structure. | Need probes tied to forecasting, generation, editing, and action utility. |
| Benchmark hygiene | warning | Representation-level diagnostics can mistake training-distribution artifacts for learned structure. | Need attribution and ablation protocols for numeric TSFM representations. |
Links Into The Wiki
- Number Tokenization
- Contradictions And Open Tensions
- FoNE
- BitTokens
- Pre-trained Large Language Models Use Fourier Features To Compute Addition
- Foundation Time-Series Model Research Agenda
Open Questions
- Which TSFM numeric embeddings show only spectral structure, and which expose task-usable geometry?
- Do periodic point-wise scalar embeddings help with noisy continuous observations, or mainly with discrete modular arithmetic?
- What probes should test whether numeric features preserve units, scale, uncertainty, and intervention intensity?