RT-2

Summary

RT-2 is a vision-language-action robot policy that turns robot control inputs into text-like action tokens emitted by a VLM.

Role In The Wiki

RT-2 anchors the action-as-language branch of modern robotics models. It is a counterexample to the idea that all modern fast robotics action heads are diffusion or flow based.

Evidence

Relation To Foundation TSFM Agenda

Use the source-level agenda mapping in rt-2-2023 rather than duplicating verdict rows here.

At the entity level, RT-2 anchors the action-as-language branch of modern robotics models. It is a counterexample to the idea that all modern fast robotics action heads are diffusion or flow based. This page should stay as the object card; source pages carry slot-level verdicts, evidence, and missing pieces.