GraphGPT: Generative Pre-trained Graph Eulerian Transformer
Source
- Raw Markdown: paper_graphgpt-2025.md
- PDF: paper_graphgpt-2025.pdf
- Preprint: arXiv 2401.00529
- OpenReview: ICML 2025 poster
- Official code: github.com/alibaba/graph-gpt
The arXiv record first appeared on 2023-12-31 and was last revised in 2025, while the OpenReview venue record is the ICML 2025 poster. The wiki uses the requested graphgpt-2025 slug because the curated venue version is ICML 2025.
Core Claim
GraphGPT argues that graph structure can be moved into the sequence-modeling regime by converting a graph or sampled subgraph into a reversible sequence of node, edge, and attribute tokens, then pretraining a standard Transformer-style encoder or decoder on those sequences.
The important interface claim is not “graphs need a new graph-specific backbone”; it is that a graph can be serialized into a token sequence that preserves topology and attributes well enough for generative pretraining and downstream graph-, edge-, and node-level tasks.
Key Contributions
- Introduces Graph Eulerian Transformer (GET), which uses Eulerian or semi-Eulerian paths to serialize graphs and sampled subgraphs into reversible token sequences.
- Represents node identities and node/edge attributes as tokens so a standard Transformer can consume graph structure without a GNN message-passing module.
- Uses next-token prediction and scheduled masked-token prediction as self-supervised graph pretraining objectives.
- Reformats graph-level, edge-level, and node-level downstream tasks as sequence tasks by appending task-specific target tokens.
- Reports strong Open Graph Benchmark results on PCQM4Mv2, ogbl-ppa, ogbl-citation2, ogbn-proteins, and related graph benchmarks.
- Shows scaling experiments up to a 2B-parameter GraphGPT-XXL model on edge-level graph tasks.
Why It Matters For Kubernetes OTEL Control Gym
Kubernetes OTEL Control Gym needs a model interface for service graph structure, graph time series, observations, actions, and control inputs. GraphGPT is relevant because it provides a concrete way to turn a graph or subgraph into a sequence that ordinary Transformer infrastructure can process.
The reusable pattern is:
service graph or sampled subgraph
+ node attributes
+ edge attributes
-> reversible node/edge/attribute token sequence
-> standard Transformer contextFor OTEL telemetry, this suggests one possible encoding layer for service topology, node metrics, edge metrics, and service metadata before a downstream passive dynamics model or action-conditioned world model consumes the resulting context. It does not by itself solve temporal forecasting, observability-specific graph time series, action logging, intervention ranking, or control.
Limitations
- GraphGPT is evaluated on graph learning benchmarks, not operational telemetry, graph time series, or SRE-style rollouts.
- The paper has no explicit action, control input, intervention, or counterfactual channel; it should not be described as an action-conditioned world model.
- The method depends on dataset-specific semantic tokens and pretraining corpora, so cross-domain transfer from molecular/protein/citation graphs to service graphs remains an open question.
- Eulerian serialization and subgraph sampling introduce ordering and sampling choices that may matter for dynamic service graphs.
- Large-model pretraining is compute-heavy; the paper reports meaningful training cost for 50M+ parameter models and scales to 2B parameters for some tasks.
- The strongest claim is about graph-to-sequence pretraining and graph benchmark performance, not closed-loop decision quality.
Foundation TSFM Relevance
| Agenda slot | Verdict | Evidence | Missing pieces |
|---|---|---|---|
| Context interface | adjacent | GraphGPT gives a concrete token interface for graph structure, node attributes, edge attributes, and sampled subgraphs. | Needs a schema for time-varying service topology, observations, events, action history, and control inputs. |
| Native multivariate and graph encoding | adjacent | The GET serialization can preserve graph topology and attributes while using standard Transformer layers. | Needs graph time-series experiments with temporal node/edge metrics rather than static graph tasks. |
| Dynamic tokenization and sequence construction | adjacent | Eulerian serialization, subgraph sampling, node re-indexing, and multi-token node identity encoding are practical graph-to-sequence design choices. | Needs sequence-length, stability, and retrieval tests under real telemetry-scale graphs. |
| Control and counterfactuals | insufficient evidence | The paper does not model actions, control inputs, interventions, rewards, or candidate future trajectories. | Add explicit observation, graph context, action/control input -> next observation/reward evaluation. |
| Benchmark level | warning | OGB performance shows graph-learning strength, but those benchmarks do not test action-conditioned rollout or operational utility. | Evaluate on ChronoGraph-like graph time series and eventually on k8s-otel-control-gym. |
Links Into The Wiki
- Graph Structure As Transformer Context
- Kubernetes OTEL Control Gym
- Foundation Time-Series Model Research Agenda
- Observability Time Series
- Action-Conditioned Time-Series Datasets
- High-Dimensional Time Series Forecasting
- World Models
- ChronoGraph
Open Questions
- Can GraphGPT-style reversible graph serialization be extended from static graphs to graph time series with time-bucketed node and edge observations?
- Should OTEL service graphs be serialized as whole graphs, ego-subgraphs around impacted services, trace-induced subgraphs, or action-targeted subgraphs?
- How should an action or control input be inserted into the graph-token sequence: as a special action token, as an edge/node attribute update, or as a separate control-input segment?
- Does Eulerian path stochasticity help service-graph transfer, or does it make temporal alignment and incident localization harder?
- What probes would show that a Transformer trained on serialized service graphs preserves action-relevant state rather than only static topology?