Maya: Optimizing Deep Learning Training Workloads using GPU Runtime Emulation

Source

Status And Credibility

Maya was first posted to arXiv on 2025-03-26 and revised on 2025-11-15. The arXiv metadata lists it as a EuroSys 2026 paper with an ACM DOI and a CC BY 4.0 license.

Treat it as credible systems evidence for transparent GPU runtime emulation in training workloads. It is not itself an LLM inference serving system, but it is directly relevant because Revati reuses the same emulation philosophy for LLM serving.

Core Claim

Maya argues that performance modeling systems for training often force users to translate workloads into custom specification languages. That translation creates a semantic gap: the model no longer observes the exact framework behavior of the real workload.

Maya instead operates at the narrow interface between training frameworks and accelerator devices. It intercepts accelerator API calls from unmodified training code, emulates device behavior, records low-level operations, and predicts runtime without requiring users to rewrite workloads.

Evidence And Results

The paper reports less than 5% prediction error across diverse models and optimization strategies.
It reports identifying configurations that reduce training costs by up to 56% compared with existing approaches.
The key mechanism is transparent device emulation: intercept CUDA/runtime/library calls, preserve framework semantics, and replace real accelerator execution with modeled behavior.

Why It Matters For GPU Inference Optimization

Maya belongs on GPU Inference Optimization as the training-side emulation predecessor to Revati. Its main relevance is methodological: if the actual framework code can run against a virtualized accelerator interface, the evaluator avoids reimplementing framework control logic.

For inference optimization, that idea becomes important whenever the serving engine changes faster than a simulator can be updated. Revati ports this idea from training to vLLM/SGLang-style serving.

Foundation TSFM Relevance

Agenda slot	Verdict	Evidence	Missing pieces
Dynamic compute and serving	adjacent	Demonstrates a GPU-free runtime-modeling path for expensive deep-learning workloads.	Evaluates training workloads rather than online inference serving or TSFM deployment.
Control and counterfactuals	adjacent	Searches configurations under predicted runtime/cost outcomes.	Requires accurate emulation and prediction models for the target runtime stack.
Benchmark validity	warning	The semantic-gap critique is broadly relevant to simulators and synthetic benchmark harnesses.	Emulation itself can have gaps if API coverage or timing models are incomplete.

Limitations And Gotchas

Maya is about training workloads; do not treat its results as direct evidence for LLM serving latency or throughput.
Transparent emulation depends on intercepting the right device/runtime APIs and staying compatible with framework and library changes.
Runtime prediction still needs calibration; emulation removes workload translation, not all modeling assumptions.

Alex Open Research Wiki

Explorer

Maya: Optimizing Deep Learning Training Workloads using GPU Runtime Emulation

Maya: Optimizing Deep Learning Training Workloads using GPU Runtime Emulation

Source

Status And Credibility

Core Claim

Evidence And Results

Why It Matters For GPU Inference Optimization

Foundation TSFM Relevance

Limitations And Gotchas

Links Into The Wiki

Graph View

Table of Contents

Backlinks