LLMs as Noisy Channels
Summary
LLMs as Noisy Channels is an ICML 2026 / arXiv scaling-law study that models LLM training through Shannon-Hartley channel capacity. It maps model size to bandwidth, training tokens to signal power, and perturbations such as data noise, model interaction, supervised fine-tuning, and quantization to noise terms.
Interface
- Main object: Shannon Scaling Law for LLM capacity.
- Variables: model size , training tokens , fitted signal exponent , bandwidth exponent , model-interaction noise exponent , data-noise exponent , and fitted constants.
- Core formula: .
- Empirical substrate: Pythia and OLMo2 checkpoints under Gaussian noise, SFT learning-rate sweeps, and GPTQ quantization.
- Released artifacts: arXiv paper and ICML poster page. No official code or model checkpoints were found at ingest time.
Role In The Wiki
This entity is the local object card for SNR-aware LLM scaling. Use it when a page needs the caveat that monotonic power-law scaling can be a high-SNR special case rather than a universal rule.
For time-series and world-model work, this is upstream language-model evidence. Its transfer value is the design question: a TSFM scaling law should include the relevant information-density and noise variables, not only parameters, samples, or FLOPs. Candidate TSFM noise variables include corrupt observations, missingness, channel interference, long-horizon rollout error, quantized latent state, post-training drift, context noise, and action/intervention ambiguity.