ops-lite
Source
- Dataset metadata snapshot: ops-lite-2026
- Official Hugging Face dataset: https://huggingface.co/datasets/anon-ops/ops-lite
Core Claim
ops-lite is a curated 500-case RCA evaluation set for microservice systems with manifest-derived causal-graph ground truth.
Dataset Notes
- The benchmark covers Train-Ticket, Hotel Reservation / DeathStarBench, and OpenTelemetry Demo.
- It has 320 Train-Ticket cases, 142 Hotel Reservation cases, and 38 OpenTelemetry Demo cases.
- Each case includes
injection.json,causal_graph.json,env.json,result.json,label.txt, and normal/abnormal parquet metric tables. - The Hugging Face card reports mean longest path 3.18, mean edge count 4.06, mean service count 3.97, and eight chaos families.
Why It Matters
ops-lite is the most compact graph-grounded RCA evaluation set in this comparison. Unlike RCAEval, it makes a causal service graph a first-class per-case artifact. Unlike ChronoGraph, it is RCA-oriented and built around chaos-injection cases rather than continuous production forecasting.
Gotchas
- The public card says the full paper/artifact release is forthcoming, so source-level detail is thinner than for RCAEval or ChronoGraph.
- It is an evaluation set, not a long passive telemetry corpus.
- The Hugging Face card lists Apache-2.0 but notes that underlying testbeds retain their own upstream licenses.
Foundation TSFM Relevance
| Agenda slot | Verdict | Evidence | Missing pieces |
|---|---|---|---|
| Observability RCA benchmarks | partially closes | The raw dataset card exposes 500 microservice chaos cases with metrics, causal graphs, labels, and normal/abnormal parquet tables. | It is a compact RCA evaluation set, not a large continuous pretraining corpus. |
| Causal structure | partially closes | Each case includes causal_graph.json, making graph-grounded RCA a first-class artifact. | Graphs describe fault propagation, not full intervention or remediation dynamics. |
| Causal and control modeling | warning | Chaos injections are controlled events, but the dataset does not log operator remediation decisions. | Needs candidate actions, policy traces, and future trajectories after fixes. |