FoNE: Precise Single-Token Number Embeddings Via Fourier Features

Source

Core Claim

FoNE maps each number into a single token embedding built from Fourier features, using sine/cosine components at digit-aligned periods so numeric values can be represented without fragmented subword or digit tokens.

Key Contributions

  • Defines Fourier Number Embedding as a concatenation of circular embeddings over powers-of-10 periods.
  • Uses each sine/cosine pair to recover a modular component of the number, giving a digit-aligned representation.
  • Adds the Fourier number embedding to a learned [NUM] token, then decodes numbers by matching hidden-state pairs to digit embeddings.
  • Reports stronger arithmetic performance and data efficiency than subword and digit-wise baselines in its controlled experiments.
  • Builds directly on the observation that pretrained LLMs already contain Fourier-like number features.

Method Notes

FoNE is the cleanest source in this batch for a smooth, periodic basis view of number embeddings. Its closest time-series analogy is EIDOS-style point-wise scalar encoding: both use bounded periodic basis functions to map scalar numeric values into higher-dimensional representations.

The difference is semantic and operational. FoNE is designed for literal numbers in language-model text and arithmetic outputs. EIDOS maps observed time-series samples into latent tokens for passive forecasting. The two should not be collapsed into one method, but they support the same broader design question: scalar numeric values may deserve specialized embeddings rather than ordinary tokenization.

Slug note: this page uses the arXiv submission year 2025 in the slug, while the OpenReview venue page lists the paper as an ICLR 2026 poster.

Evidence And Results

The abstract and results report that FoNE reduces the number of tokens per number and improves arithmetic accuracy in controlled language-model experiments. The project page and paper present a concrete tokenization comparison for a decimal number, then show how modular Fourier components represent digits.

The source is also important historically: it cites Pre-trained Large Language Models Use Fourier Features To Compute Addition as the mechanistic motivation for explicitly building Fourier number embeddings.

Limitations

FoNE’s strongest claims are for controlled arithmetic tasks. BitTokens challenges its generality, arguing that sinusoidal/Fourier encodings are well suited to addition but force non-local decoding and re-encoding for multiplication and division. Convergent Evolution adds a diagnostic caveat: Fourier spectra can be present even when modular residue classes are not linearly usable, so FoNE-style claims should be checked with geometric probes or downstream task tests, not spectrum alone. Treat FoNE as an important representation proposal, not as a settled universal numeric encoding.

Foundation TSFM Relevance

Agenda slotVerdictEvidenceMissing pieces
Point-wise numeric embeddingspartially closesEncodes each number as one Fourier-feature token with digit-aligned periods and exact modular recovery properties.Not tested on sensor values, units, missingness, uncertainty, or time-series forecasting.
Representation qualityadjacentPreserves dense numeric detail better than fragmented subword or digit tokens in controlled arithmetic tasks.No evidence that the representation preserves regimes, causal variables, or generative fidelity for time series.
Benchmarks: what level of modeling is tested?warningStrong evidence is arithmetic-focused, including addition, subtraction, and multiplication.Arithmetic accuracy should not be treated as proof of TSFM numeric-token quality.

Open Questions

  • Can FoNE-style periodic scalar embeddings improve point-wise time-series embeddings beyond arithmetic tasks?
  • Should Fourier number embeddings be combined with bit-level or logarithmic encodings to cover both addition and multiplication-like operations?
  • How should sign, uncertainty, missingness, and measurement units be represented when FoNE-style encodings are applied to auxiliary numeric values?