Tokenizer Transfer

Summary

Tokenizer transfer is the problem of moving model capability across tokenization regimes without retraining from scratch.

What The Wiki Currently Believes

Bolmo treats byteification as a special case of tokenizer transfer: a subword LM serves as a teacher for a byte-level latent-tokenizer architecture.
Compute Optimal Tokenization adds a scaling constraint for tokenizer transfer: moving across tokenizers should preserve an appropriate byte-per-parameter training ratio, not a token-per-parameter ratio.
FoNE and BitTokens raise the numeric version of the transfer problem: retrofitting a pretrained model with typed number tokens or numeric heads may require more than swapping the tokenizer.
ReinPatch is a weaker but useful time-series analogue: a learned patching policy is pretrained on broad univariate time series and transferred zero-shot to downstream forecasting datasets while the backbone is trained separately.
Pretrained Transformers as Universal Computation Engines is adjacent rather than a tokenizer-transfer method: it tests whether the frozen computation inside a language-pretrained Transformer transfers after only input/output adaptation.

Evidence

Bolmo’s main empirical claim is that byte-level LMs can become competitive quickly by exact distillation from existing subword LMs, rather than paying the full cost of from-scratch byte-level pretraining. Compute Optimal Tokenization gives that migration a budget check: if the target tokenizer changes compression rate, the old source model’s token count is no longer the stable scaling unit.

The number-tokenization sources do not yet solve transfer, but they make the migration problem concrete. A pretrained model may already contain spectral number-token structure, but Convergent Evolution shows that spectrum alone is not enough; tokenizer migration should preserve or re-learn task-usable geometry, parser behavior, and numeric heads.

ReinPatch does not transfer a full forecasting model or a text tokenizer. Its transfer claim is narrower: the segmentation policy itself can be reused as a frozen front-end for continuous time-series observations. That makes it a useful case study for separating learned tokenization from downstream model weights.

The FPT source adds a more general caution: sometimes the useful transferable object is not the tokenizer at all, but the pretrained sequence computation behind it. For time series, this keeps two questions separate: how numeric observations should be tokenized, and whether pretrained Transformer blocks should be reused.

Relation To Foundation TSFM Agenda

Tokenizer transfer is adjacent to the Foundation Time-Series Model Research Agenda through adaptive tokenization and numeric interface migration. ReinPatch is the direct time-series case for transferring a learned patching policy; FoNE and BitTokens are warnings that numeric-token migration can require parser, embedding, and head changes rather than a tokenizer swap.

Open Questions

Which capabilities are lost when moving from subword to byte-level latent patches?
When changing tokenizers, should distillation preserve source logits, byte-level likelihood, byte-per-parameter training ratios, or serving FLOPs per byte?
Can tokenizer transfer be combined with learned dynamic chunking rather than fixed byte patches?
Can a learned time-series patcher be transferred across datasets without baking in univariate or benchmark-specific assumptions?
Can existing LLMs be migrated to typed numeric tokens without losing text capabilities or memorized numeric facts?
Can graph-token vocabularies transfer across service graphs, telemetry schemas, and organizations, or do they overfit to one topology and monitoring stack?

Alex Open Research Wiki

Explorer

Tokenizer Transfer

Tokenizer Transfer

Summary

What The Wiki Currently Believes

Evidence

Relation To Foundation TSFM Agenda

Open Questions

Graph View

Table of Contents

Backlinks

Alex Open Research Wiki

Explorer

Tokenizer Transfer

Tokenizer Transfer

Summary

What The Wiki Currently Believes

Evidence

Relation To Foundation TSFM Agenda

Open Questions

Related Pages

Graph View

Table of Contents

Backlinks