iLLaDA
Summary
iLLaDA is the 8B improved LLaDA model family introduced by Improved Large Language Diffusion Models. It is a fully bidirectional masked diffusion language model trained from scratch on 12T pre-training tokens, then supervised-fine-tuned with the same masked diffusion objective over a 25B-token instruction corpus.
Role In The Wiki
Use this page as the object card for the iLLaDA model family. The source page carries the detailed paper evidence, limitations, and time-series/world-model agenda mapping.
iLLaDA should be tracked as a current diffusion-language-model milestone: the base model is reported competitive with Qwen2.5 7B on several benchmarks, while the instruct model still lags Qwen2.5 7B Instruct. That split makes it a useful signal for the live question of whether masked diffusion language models need different post-training, alignment, and scoring protocols from autoregressive LLMs.
Official Artifacts
- Preprint: arXiv 2606.25331
- Official code and inference/evaluation repository: ML-GSAI/LLaDA
- Official base model: GSAI-ML/iLLaDA-8B-Base — Apache-2.0, safetensors, BF16, 8B parameters.
- Official instruct model: GSAI-ML/iLLaDA-8B-Instruct — Apache-2.0, safetensors, BF16, 8B parameters.
- Local artifact metadata:
papers/illada-2026/official_artifacts_metadata.json.
Evidence
Relation To Foundation TSFM Agenda
Use the source-level agenda mapping in illada-2026 rather than duplicating verdict rows here. At the entity level, iLLaDA is an upstream language-generation and inference-protocol signal. It strengthens the case for diffusion/flow-style sequence generation as an active branch to monitor, but it is not evidence for numeric time-series modeling, native multivariate state, event streams, control inputs, interventions, or action-conditioned rollouts.