ChronoGraph: A Real-World Graph-Based Multivariate Time Series Dataset
Source
- Raw Markdown: paper_chronograph-2025.md
- PDF: paper_chronograph-2025.pdf
- Dataset metadata snapshot: chronograph-2025
- arXiv: https://arxiv.org/abs/2509.04449
- Official code and data: https://github.com/bit-ml/ChronoGraph
Core Claim
ChronoGraph is a graph-structured multivariate time-series forecasting dataset from production microservices with incident labels.
Dataset Notes
- The dataset covers 708 services and 1529 directed service-to-service edges.
- Each service has five temporal node features, and each edge has eight temporal interaction features.
- The paper reports six months of telemetry, 8005 aligned time steps, and 17 expert-labeled anomaly segments.
- The official data layout is graph-native:
edges.csv,node_features.json, andedge_features_part{i}.json. - The main tasks are service-level forecasting, anomaly detection, and incident-aware evaluation.
- Reported baselines include Prophet, Chronos-Bolt Base, TabPFN-TS, Autoencoder, Isolation Forest, One-Class SVM, and a Prophet/Isolation Forest/Autoencoder ensemble.
Action-Time-Series Notes
- It is definitely a time-series dataset, but the source does not expose controllable actions or interventions as a channel.
- Incident windows are labels or exogenous shocks, not operator actions.
- It belongs as a near-miss for passive world models, not as a primary action-conditioned dataset.
Gotchas
- The evaluated baseline suite is mostly topology-agnostic, despite the dataset itself being graph-native.
- ChronoGraph is the closest public match here to “whole graph plus temporal node/edge features”, but it is still passive telemetry rather than an intervention log.
- The public repository uses Apache-2.0, but the knowledge base does not mirror dataset payloads.
Foundation TSFM Relevance
| Agenda slot | Verdict | Evidence | Missing pieces |
|---|---|---|---|
| Native multivariate encoding and high-channel scaling | partially closes | Provides 708 services, 1529 directed edges, node metrics, edge metrics, and incident labels from production microservices. | The baseline suite is mostly topology-agnostic; no foundation model demonstrates graph-aware scaling here. |
| Benchmarks: what level of modeling is tested? | partially closes | Tests forecasting and anomaly detection during incidents and shows long-horizon and anomaly-detection failures. | Does not test causal/counterfactual reasoning or action-conditioned control utility. |
| Causal structure, counterfactuals, and control | insufficient evidence | The service graph plus telemetry is close to an observability state substrate for digital-world agents. | No operator action, deployment, rollback, or autoscaling intervention channel is logged. |