Gemini Robotics 1.5
Source
- Raw Markdown: paper_gemini-robotics-1-5-2025.md
- PDF: paper_gemini-robotics-1-5-2025.pdf
- Preprint: arXiv 2510.03342
Core Claim
Gemini Robotics 1.5 is a robot foundation-model family that combines a VLA action model with an embodied-reasoning VLM orchestrator. The system uses language/thinking as context and a subtask-handoff interface for planning, progress checking, and control-input generation.
Method Notes
- Gemini Robotics-ER 1.5 is the higher-level embodied reasoning model; Gemini Robotics 1.5 is the VLA/action model.
- The VLA model outputs continuous numeric robot control inputs and can emit thinking text when that mode is enabled.
- The source is a strong anchor for hierarchical text-conditioned control, but it should not be classified as diffusion or flow unless a future source states the action generator objective explicitly.
Evidence And Limitations
The paper reports multi-embodiment control, Motion Transfer across robot platforms, thinking-mode gains, and an agentic system that combines orchestrator and action model. Limitations include private model availability, bounded safety claims, and the difficulty of separating VLA execution gains from higher-level orchestration gains.
Foundation TSFM Relevance
| Agenda slot | Verdict | Evidence | Missing pieces |
|---|---|---|---|
| Causal structure, counterfactuals, and control | adjacent | Combines an embodied reasoning model, VLA action model, progress understanding, and continuous numeric robot control inputs, which is an analogy for the digital-world robot action interface. | No passive-to-counterfactual dynamics model, candidate-action future evaluation, or analogous digital telemetry/topology/action API. |
| Context interface | adjacent | Uses language, images/video, subtask handoffs, and thinking as context for action. | Does not define channel context or general context schemas for multivariate time-series systems. |
| Benchmarks | warning | Real-robot A/B/n evaluations reduce some variance across tasks and embodiments. | Private model and benchmark details limit reproducibility and TSFM comparability. |
Links Into The Wiki
- Gemini Robotics 1.5
- Foundation Time-Series Model Research Agenda
- Robotics Text Conditioning
- Robotics Time-Series Modeling
- Slow Thinking For Robotics And Time Series
Open Questions
- Which task gains require natural-language thinking, and which could be handled by latent or action-space subgoals?
- How should this wiki evaluate private robotics models whose architecture details and weights are not fully public?