LLMs as Noisy Channels

Summary

LLMs as Noisy Channels is an ICML 2026 / arXiv scaling-law study that models LLM training through Shannon-Hartley channel capacity. It maps model size to bandwidth, training tokens to signal power, and perturbations such as data noise, model interaction, supervised fine-tuning, and quantization to noise terms.

Interface

  • Main object: Shannon Scaling Law for LLM capacity.
  • Variables: model size , training tokens , fitted signal exponent , bandwidth exponent , model-interaction noise exponent , data-noise exponent , and fitted constants.
  • Core formula: .
  • Empirical substrate: Pythia and OLMo2 checkpoints under Gaussian noise, SFT learning-rate sweeps, and GPTQ quantization.
  • Released artifacts: arXiv paper and ICML poster page. No official code or model checkpoints were found at ingest time.

Role In The Wiki

This entity is the local object card for SNR-aware LLM scaling. Use it when a page needs the caveat that monotonic power-law scaling can be a high-SNR special case rather than a universal rule.

For time-series and world-model work, this is upstream language-model evidence. Its transfer value is the design question: a TSFM scaling law should include the relevant information-density and noise variables, not only parameters, samples, or FLOPs. Candidate TSFM noise variables include corrupt observations, missingness, channel interference, long-horizon rollout error, quantized latent state, post-training drift, context noise, and action/intervention ambiguity.

Evidence