Turbo-GNN
Summary
Turbo-GNN is the official implementation released with On Efficient Scaling of GNNs via IO-Aware Layers Implementations. The official repository describes custom CUDA and Triton kernels, cuSPARSE-backed sparse aggregation paths, and Python/PyTorch interfaces for faster GNN layer execution, including SpMM aggregation, reduction aggregation, GATv2 aggregation, and Graph Transformer aggregation.
Role In The Wiki
Turbo-GNN is an implementation artifact rather than a model family. Its local role is to raise the baseline floor for graph time-series and graph-control experiments: if a GNN baseline is slow because it materializes edge-wise tensors or relies on unoptimized framework defaults, that is not decisive evidence against graph-aware modeling.
For time-series and world-model work, Turbo-GNN is useful when the experiment needs direct message passing over topology: service graphs, power-grid topology, graph observability benchmarks, graph neural surrogates, or graph-attention baselines. It is not itself evidence for action-conditioned world modeling, counterfactual prediction, or latent-state maintenance.
Official Artifacts
- Paper: On Efficient Scaling of GNNs via IO-Aware Layers Implementations
- Code: yandex-research/On-Efficient-Scaling-Of-GNNs
- Package name from the official repository:
turbo-gnn - Official blog: Yandex Research blog post
What It Exposes
spmm_aggrfor SpMM-style aggregation backed by cuSPARSE-oriented execution.reduction_aggrfor min/max-style aggregation with degree-aware heavy-node handling.gatv2_aggrfor GATv2-style fused attention aggregation.graph_transformer_aggrfor Graph Transformer-style neighborhood attention.- Autotuning hooks for custom kernels where graph shape and feature shape change performance.
Practical Use In Our Experiments
Use Turbo-GNN or equivalent modern kernels when a graph baseline is meant to answer a modeling question rather than only demonstrate framework overhead. For example:
- ChronoGraph-style graph multivariate time-series forecasting should compare graph encoders under matched latency and memory budgets.
- Kubernetes/OpenTelemetry control experiments should report whether direct message-passing baselines use fused graph attention or materialized edge tensors.
- Grid2Op/power-grid graph surrogates should distinguish model error from sparse-kernel overhead.