GAIA / MicroSS
Source
- Dataset metadata snapshot: gaia-micross-2021
- Official GitHub: https://github.com/CloudWise-OpenSource/GAIA-DataSet
Core Claim
GAIA is an AIOps dataset collection for anomaly detection, log analysis, fault localization, and related tasks. MicroSS is the graph/system-relevant subset: a business-simulation microservice system with metrics, traces, business logs, and anomaly-injection records.
Dataset Notes
- MicroSS contains more than 6500 metrics, more than 7000000 log items, and detailed trace data collected over an initial two-week window.
- The trace schema includes service names, host IPs, trace IDs, span IDs, parent IDs, URLs, status codes, and messages.
- Companion Data contains 406 anomaly-detection and metric-prediction series, including 279 labeled series, plus about 218736 log records.
- MicroSS is system-level, but the public layout is not a single graph object. Service call structure is reconstructed from traces.
Why It Matters
GAIA/MicroSS is useful when a benchmark needs AIOps-style metrics, logs, traces, business logs, and anomaly-injection records from a whole microservice scenario. It is less graph-native than ChronoGraph but richer in trace/log context.
Gotchas
- Companion Data should not be treated as a whole-service-graph dataset; it is closer to single-series KPI and log-task data.
- License metadata is inconsistent: the repository
LICENSEfile is GPL-2.0, while the README license section says Apache 2.0. - Anomaly injections are exogenous benchmark events, not logged operator remediations.
Foundation TSFM Relevance
| Agenda slot | Verdict | Evidence | Missing pieces |
|---|---|---|---|
| Benchmarks: what level of modeling is tested? | partially closes | MicroSS combines metrics, traces, business logs, system logs, and anomaly-injection records from a microservice scenario, making it a small observability benchmark for the digital-world robot north star. | It is not packaged as a clean graph object and does not include remediation action histories. |
| Context interface | adjacent | Trace records include service names, host IPs, trace IDs, spans, parent IDs, URLs, status codes, and messages. | Context must be reconstructed from raw traces/logs; no canonical channel metadata schema is provided. |
| Control and counterfactuals | warning | Fault/anomaly injections are logged benchmark conditions. | Injections are exogenous events, not operator actions or controllable interventions for policy learning. |