ConceptMoE: Adaptive Token-To-Concept Compression For Implicit Compute Allocation

Source

Raw Markdown: paper_conceptmoe-2026.md
PDF: paper_conceptmoe-2026.pdf

Core Claim

ConceptMoE improves efficiency and effectiveness by merging semantically similar token sequences into concept representations before expensive MoE computation.

Key Contributions

Introduces learnable token-to-concept chunking based on semantic similarity.
Uses MoE to compare architectures under matched total parameters and activated FLOPs.
Reports improvements on language pretraining, long-context understanding, multimodal benchmarks, and continual conversion.
Reduces attention computation and KV cache requirements at higher compression ratios.

Method Notes

ConceptMoE connects Latent Tokenization with Mixture Of Experts: compression is not only a preprocessing step, but an implicit compute-allocation mechanism.

Evidence And Results

The abstract reports +0.9 language pretraining points, +2.3 long-context points, +0.6 multimodal points, and +5.5 points during continual training conversion under controlled settings.

Limitations

The source does not remove tokenization entirely; it compresses already-tokenized streams into concepts. It should be compared with byte-native methods such as H-Net and Synergy.

Foundation TSFM Relevance

Agenda slot	Verdict	Evidence	Missing pieces
Dynamic compute allocation	adjacent	Dynamically merges semantically similar token spans before expensive MoE layers, reallocating saved computation under matched activated FLOPs.	Evidence is language and multimodal pretraining, not time-series spans, channels, regimes, or candidate futures.
Dynamic tokenization	adjacent	Learns token-to-concept chunk boundaries and tests compression ratio, router design, and dechunking.	Needs numeric-stream boundaries that preserve spikes, missingness, change points, and dense reconstruction.
Streaming state and long context	adjacent	Reduces token count and KV/cache pressure at higher compression ratios.	Does not maintain an always-on latent state or prove online update behavior.

Links Into The Wiki

Open Questions

How stable are learned concept boundaries across domains?
Can concept compression be combined with byte-level or pixel-level inputs?

Alex Open Research Wiki

Explorer

ConceptMoE: Adaptive Token-To-Concept Compression For Implicit Compute Allocation

ConceptMoE: Adaptive Token-To-Concept Compression For Implicit Compute Allocation

Source

Core Claim

Key Contributions

Method Notes

Evidence And Results

Limitations

Foundation TSFM Relevance

Links Into The Wiki

Open Questions

Graph View

Table of Contents

Backlinks