Genie
Summary
Genie is Google DeepMind’s 2024 generative interactive environment model: an image/video world model that learns a discrete latent-action space from unlabeled videos and uses those latent actions to generate controllable future frames.
Role In The Wiki
Genie is the local anchor for the “learn actions from video-only data” branch of world modeling. It matters because it separates the control interface problem from ordinary passive video generation: the model is not only predicting future frames, it is conditioned on an inferred action-like code.
For Alex’s foundation time-series agenda, Genie is an analogy rather than a direct solution. It shows that missing actions can sometimes be inferred from observations, but operational world models should still prefer explicit typed actions, control inputs, interventions, and outcomes when those can be logged.
Official Artifacts
- Project page: https://sites.google.com/view/genie-2024/
- DeepMind publication page: https://deepmind.google/research/publications/genie-generative-interactive-environments/
- ICML / PMLR page: https://proceedings.mlr.press/v235/bruce24a.html
- OpenReview page: https://openreview.net/forum?id=bJbSbJskOS
- arXiv: https://arxiv.org/abs/2402.15391
The paper states that the trained model checkpoints, main training dataset, and examples from that dataset were not released with the paper or website.
Evidence
Relation To Foundation TSFM Agenda
Use the source-level agenda mapping in genie-2024 rather than duplicating verdict rows here.
At the entity level, Genie should stay as the object card for the model family and official artifacts. The source page carries the evidence ledger, limitations, and source-specific relevance to world models and foundation time-series modeling.