iLLaDA

Summary

iLLaDA is the 8B improved LLaDA model family introduced by Improved Large Language Diffusion Models. It is a fully bidirectional masked diffusion language model trained from scratch on 12T pre-training tokens, then supervised-fine-tuned with the same masked diffusion objective over a 25B-token instruction corpus.

Role In The Wiki

Use this page as the object card for the iLLaDA model family. The source page carries the detailed paper evidence, limitations, and time-series/world-model agenda mapping.

iLLaDA should be tracked as a current diffusion-language-model milestone: the base model is reported competitive with Qwen2.5 7B on several benchmarks, while the instruct model still lags Qwen2.5 7B Instruct. That split makes it a useful signal for the live question of whether masked diffusion language models need different post-training, alignment, and scoring protocols from autoregressive LLMs.

Official Artifacts

Evidence

Relation To Foundation TSFM Agenda

Use the source-level agenda mapping in illada-2026 rather than duplicating verdict rows here. At the entity level, iLLaDA is an upstream language-generation and inference-protocol signal. It strengthens the case for diffusion/flow-style sequence generation as an active branch to monitor, but it is not evidence for numeric time-series modeling, native multivariate state, event streams, control inputs, interventions, or action-conditioned rollouts.