GraphGPT: Generative Pre-trained Graph Eulerian Transformer

Source

The arXiv record first appeared on 2023-12-31 and was last revised in 2025, while the OpenReview venue record is the ICML 2025 poster. The wiki uses the requested graphgpt-2025 slug because the curated venue version is ICML 2025.

Core Claim

GraphGPT argues that graph structure can be moved into the sequence-modeling regime by converting a graph or sampled subgraph into a reversible sequence of node, edge, and attribute tokens, then pretraining a standard Transformer-style encoder or decoder on those sequences.

The important interface claim is not “graphs need a new graph-specific backbone”; it is that a graph can be serialized into a token sequence that preserves topology and attributes well enough for generative pretraining and downstream graph-, edge-, and node-level tasks.

Key Contributions

  • Introduces Graph Eulerian Transformer (GET), which uses Eulerian or semi-Eulerian paths to serialize graphs and sampled subgraphs into reversible token sequences.
  • Represents node identities and node/edge attributes as tokens so a standard Transformer can consume graph structure without a GNN message-passing module.
  • Uses next-token prediction and scheduled masked-token prediction as self-supervised graph pretraining objectives.
  • Reformats graph-level, edge-level, and node-level downstream tasks as sequence tasks by appending task-specific target tokens.
  • Reports strong Open Graph Benchmark results on PCQM4Mv2, ogbl-ppa, ogbl-citation2, ogbn-proteins, and related graph benchmarks.
  • Shows scaling experiments up to a 2B-parameter GraphGPT-XXL model on edge-level graph tasks.

Why It Matters For Kubernetes OTEL Control Gym

Kubernetes OTEL Control Gym needs a model interface for service graph structure, graph time series, observations, actions, and control inputs. GraphGPT is relevant because it provides a concrete way to turn a graph or subgraph into a sequence that ordinary Transformer infrastructure can process.

The reusable pattern is:

service graph or sampled subgraph
  + node attributes
  + edge attributes
  -> reversible node/edge/attribute token sequence
  -> standard Transformer context

For OTEL telemetry, this suggests one possible encoding layer for service topology, node metrics, edge metrics, and service metadata before a downstream passive dynamics model or action-conditioned world model consumes the resulting context. It does not by itself solve temporal forecasting, observability-specific graph time series, action logging, intervention ranking, or control.

Limitations

  • GraphGPT is evaluated on graph learning benchmarks, not operational telemetry, graph time series, or SRE-style rollouts.
  • The paper has no explicit action, control input, intervention, or counterfactual channel; it should not be described as an action-conditioned world model.
  • The method depends on dataset-specific semantic tokens and pretraining corpora, so cross-domain transfer from molecular/protein/citation graphs to service graphs remains an open question.
  • Eulerian serialization and subgraph sampling introduce ordering and sampling choices that may matter for dynamic service graphs.
  • Large-model pretraining is compute-heavy; the paper reports meaningful training cost for 50M+ parameter models and scales to 2B parameters for some tasks.
  • The strongest claim is about graph-to-sequence pretraining and graph benchmark performance, not closed-loop decision quality.

Foundation TSFM Relevance

Agenda slotVerdictEvidenceMissing pieces
Context interfaceadjacentGraphGPT gives a concrete token interface for graph structure, node attributes, edge attributes, and sampled subgraphs.Needs a schema for time-varying service topology, observations, events, action history, and control inputs.
Native multivariate and graph encodingadjacentThe GET serialization can preserve graph topology and attributes while using standard Transformer layers.Needs graph time-series experiments with temporal node/edge metrics rather than static graph tasks.
Dynamic tokenization and sequence constructionadjacentEulerian serialization, subgraph sampling, node re-indexing, and multi-token node identity encoding are practical graph-to-sequence design choices.Needs sequence-length, stability, and retrieval tests under real telemetry-scale graphs.
Control and counterfactualsinsufficient evidenceThe paper does not model actions, control inputs, interventions, rewards, or candidate future trajectories.Add explicit observation, graph context, action/control input -> next observation/reward evaluation.
Benchmark levelwarningOGB performance shows graph-learning strength, but those benchmarks do not test action-conditioned rollout or operational utility.Evaluate on ChronoGraph-like graph time series and eventually on k8s-otel-control-gym.

Open Questions

  • Can GraphGPT-style reversible graph serialization be extended from static graphs to graph time series with time-bucketed node and edge observations?
  • Should OTEL service graphs be serialized as whole graphs, ego-subgraphs around impacted services, trace-induced subgraphs, or action-targeted subgraphs?
  • How should an action or control input be inserted into the graph-token sequence: as a special action token, as an edge/node attribute update, or as a separate control-input segment?
  • Does Eulerian path stochasticity help service-graph transfer, or does it make temporal alignment and incident localization harder?
  • What probes would show that a Transformer trained on serialized service graphs preserves action-relevant state rather than only static topology?