ChronoGraph: A Real-World Graph-Based Multivariate Time Series Dataset

Source

Core Claim

ChronoGraph is a graph-structured multivariate time-series forecasting dataset from production microservices with incident labels.

Dataset Notes

  • The dataset covers 708 services and 1529 directed service-to-service edges.
  • Each service has five temporal node features, and each edge has eight temporal interaction features.
  • The paper reports six months of telemetry, 8005 aligned time steps, and 17 expert-labeled anomaly segments.
  • The official data layout is graph-native: edges.csv, node_features.json, and edge_features_part{i}.json.
  • The main tasks are service-level forecasting, anomaly detection, and incident-aware evaluation.
  • Reported baselines include Prophet, Chronos-Bolt Base, TabPFN-TS, Autoencoder, Isolation Forest, One-Class SVM, and a Prophet/Isolation Forest/Autoencoder ensemble.

Action-Time-Series Notes

  • It is definitely a time-series dataset, but the source does not expose controllable actions or interventions as a channel.
  • Incident windows are labels or exogenous shocks, not operator actions.
  • It belongs as a near-miss for passive world models, not as a primary action-conditioned dataset.

Gotchas

  • The evaluated baseline suite is mostly topology-agnostic, despite the dataset itself being graph-native.
  • ChronoGraph is the closest public match here to “whole graph plus temporal node/edge features”, but it is still passive telemetry rather than an intervention log.
  • The public repository uses Apache-2.0, but the knowledge base does not mirror dataset payloads.

Foundation TSFM Relevance

Agenda slotVerdictEvidenceMissing pieces
Native multivariate encoding and high-channel scalingpartially closesProvides 708 services, 1529 directed edges, node metrics, edge metrics, and incident labels from production microservices.The baseline suite is mostly topology-agnostic; no foundation model demonstrates graph-aware scaling here.
Benchmarks: what level of modeling is tested?partially closesTests forecasting and anomaly detection during incidents and shows long-horizon and anomaly-detection failures.Does not test causal/counterfactual reasoning or action-conditioned control utility.
Causal structure, counterfactuals, and controlinsufficient evidenceThe service graph plus telemetry is close to an observability state substrate for digital-world agents.No operator action, deployment, rollback, or autoscaling intervention channel is logged.