# GAIA / MicroSS

Canonical source: <https://github.com/CloudWise-OpenSource/GAIA-DataSet>
Introducing source: [GAIA / MicroSS](../../wiki/sources/gaia-micross-2021.md)

## Dataset Type

GAIA, Generic AIOps Atlas, is an AIOps dataset collection for anomaly detection, log analysis, fault localization, and related operations tasks. For graph-level observability work, the relevant part is MicroSS: a business-simulation microservice system with metrics, traces, business logs, and anomaly-injection records. Companion Data is useful context, but it is closer to single-series KPI and log tasks than to whole-system graph telemetry.

## Temporal And System Structure

The GAIA README reports that MicroSS contains more than 6500 metrics, more than 7000000 log items, and detailed trace data continuously collected for two weeks. The trace schema includes service names, host IPs, trace IDs, span IDs, parent IDs, URLs, status codes, and messages, which makes service call structure reconstructable from traces. The dataset is not packaged as a single explicit directed graph with edge features in the same way as ChronoGraph.

## Data Structure

- `MicroSS/metric`: CSV files with `timestamp` and `value`.
- `MicroSS/trace`: trace records with timestamp, host IP, service name, trace/span/parent IDs, start and end times, URL, status code, and message.
- `MicroSS/business`: business logs with `datetime`, `service`, and `message`.
- `MicroSS/run`: system logs and anomaly injection records.
- `Companion_Data/metric_detection`: single-series anomaly detection rows with `timestamp`, `value`, and `label`.
- `Companion_Data/metric_forecast`: single-series forecasting rows with `timestamp` and `value`.
- `Companion_Data/log`: log parsing, semantic anomaly detection, and named entity recognition data.

## Inputs And Outputs

Inputs are metric time series, traces, business logs, and run/anomaly-injection logs. Outputs depend on the task: anomaly labels, fault-localization targets, metric forecast targets, or log parsing/NER labels.

## Reported Scale

- MicroSS: more than 6500 metrics and more than 7000000 log items.
- MicroSS: detailed trace data, initially described as continuously collected for two weeks.
- Companion Data: 406 anomaly-detection and metric-prediction series, including 279 labeled series.
- Companion Data: about 218736 log records for parsing, semantic anomaly detection, and named entity recognition.

## Actions Or Interventions

MicroSS includes controlled anomaly injections and user-behavior manipulations used to simulate faults. These are best treated as exogenous events or benchmark conditions. They are not a logged operator-action channel for modeling remediation decisions.

## Access And License Notes

The official GitHub repository is public. License metadata is inconsistent: the repository `LICENSE` file is GPL-2.0, while the README license section says Apache 2.0. Treat reuse terms as unresolved until checked directly with the maintainers or a pinned release artifact.
