ops-lite

Source

Core Claim

ops-lite is a curated 500-case RCA evaluation set for microservice systems with manifest-derived causal-graph ground truth.

Dataset Notes

  • The benchmark covers Train-Ticket, Hotel Reservation / DeathStarBench, and OpenTelemetry Demo.
  • It has 320 Train-Ticket cases, 142 Hotel Reservation cases, and 38 OpenTelemetry Demo cases.
  • Each case includes injection.json, causal_graph.json, env.json, result.json, label.txt, and normal/abnormal parquet metric tables.
  • The Hugging Face card reports mean longest path 3.18, mean edge count 4.06, mean service count 3.97, and eight chaos families.

Why It Matters

ops-lite is the most compact graph-grounded RCA evaluation set in this comparison. Unlike RCAEval, it makes a causal service graph a first-class per-case artifact. Unlike ChronoGraph, it is RCA-oriented and built around chaos-injection cases rather than continuous production forecasting.

Gotchas

  • The public card says the full paper/artifact release is forthcoming, so source-level detail is thinner than for RCAEval or ChronoGraph.
  • It is an evaluation set, not a long passive telemetry corpus.
  • The Hugging Face card lists Apache-2.0 but notes that underlying testbeds retain their own upstream licenses.

Foundation TSFM Relevance

Agenda slotVerdictEvidenceMissing pieces
Observability RCA benchmarkspartially closesThe raw dataset card exposes 500 microservice chaos cases with metrics, causal graphs, labels, and normal/abnormal parquet tables.It is a compact RCA evaluation set, not a large continuous pretraining corpus.
Causal structurepartially closesEach case includes causal_graph.json, making graph-grounded RCA a first-class artifact.Graphs describe fault propagation, not full intervention or remediation dynamics.
Causal and control modelingwarningChaos injections are controlled events, but the dataset does not log operator remediation decisions.Needs candidate actions, policy traces, and future trajectories after fixes.