Maya: Optimizing Deep Learning Training Workloads using GPU Runtime Emulation
Source
- Raw Markdown: paper_maya-2025.md
- PDF: paper_maya-2025.pdf
- Preprint: arXiv:2503.20191
- ACM DOI: 10.1145/3767295.3769366
Status And Credibility
Maya was first posted to arXiv on 2025-03-26 and revised on 2025-11-15. The arXiv metadata lists it as a EuroSys 2026 paper with an ACM DOI and a CC BY 4.0 license.
Treat it as credible systems evidence for transparent GPU runtime emulation in training workloads. It is not itself an LLM inference serving system, but it is directly relevant because Revati reuses the same emulation philosophy for LLM serving.
Core Claim
Maya argues that performance modeling systems for training often force users to translate workloads into custom specification languages. That translation creates a semantic gap: the model no longer observes the exact framework behavior of the real workload.
Maya instead operates at the narrow interface between training frameworks and accelerator devices. It intercepts accelerator API calls from unmodified training code, emulates device behavior, records low-level operations, and predicts runtime without requiring users to rewrite workloads.
Evidence And Results
- The paper reports less than 5% prediction error across diverse models and optimization strategies.
- It reports identifying configurations that reduce training costs by up to 56% compared with existing approaches.
- The key mechanism is transparent device emulation: intercept CUDA/runtime/library calls, preserve framework semantics, and replace real accelerator execution with modeled behavior.
Why It Matters For GPU Inference Optimization
Maya belongs on GPU Inference Optimization as the training-side emulation predecessor to Revati. Its main relevance is methodological: if the actual framework code can run against a virtualized accelerator interface, the evaluator avoids reimplementing framework control logic.
For inference optimization, that idea becomes important whenever the serving engine changes faster than a simulator can be updated. Revati ports this idea from training to vLLM/SGLang-style serving.
Foundation TSFM Relevance
| Agenda slot | Verdict | Evidence | Missing pieces |
|---|---|---|---|
| Dynamic compute and serving | adjacent | Demonstrates a GPU-free runtime-modeling path for expensive deep-learning workloads. | Evaluates training workloads rather than online inference serving or TSFM deployment. |
| Control and counterfactuals | adjacent | Searches configurations under predicted runtime/cost outcomes. | Requires accurate emulation and prediction models for the target runtime stack. |
| Benchmark validity | warning | The semantic-gap critique is broadly relevant to simulators and synthetic benchmark harnesses. | Emulation itself can have gaps if API coverage or timing models are incomplete. |
Limitations And Gotchas
- Maya is about training workloads; do not treat its results as direct evidence for LLM serving latency or throughput.
- Transparent emulation depends on intercepting the right device/runtime APIs and staying compatible with framework and library changes.
- Runtime prediction still needs calibration; emulation removes workload translation, not all modeling assumptions.