RCAEval

Source

Core Claim

RCAEval is a reproducible root-cause-analysis benchmark for microservice systems. It contributes datasets, loaders, evaluation metrics, and baseline implementations for metric-based, trace-based, and multi-source RCA.

Dataset Notes

  • RCAEval covers Online Boutique, Sock Shop, and Train Ticket.
  • It organizes nine datasets under RE1, RE2, and RE3, with 735 failure cases and 11 fault types.
  • RE1 is metric-only. RE2 and RE3 include metrics, logs, and traces.
  • Each failure case includes annotated root-cause service and root-cause indicator labels.
  • The Zenodo archives total about 5.16 GB compressed.

Reported Baselines

The framework includes RUN, CausalRCA, CIRCA, RCD, MicroCause, EasyRCA, MSCRED, BARO, epsilon-Diagnosis, TraceRCA, MicroRank, PDiagnose, multi-source BARO, multi-source RCD, multi-source CIRCA, and TORAI.

Why It Matters

RCAEval is the most practical evaluation harness in this group for comparing RCA methods. It is less graph-native than ChronoGraph or ops-lite, but stronger as a reproducible benchmark environment with many baselines.

Gotchas

  • The input surface is often flattened into files or data frames, so graph structure must usually come from system knowledge or traces.
  • Fault injections are benchmark events, not an operator action channel.
  • The dataset record is CC-BY-4.0, while the framework code is MIT.

Foundation TSFM Relevance

Agenda slotVerdictEvidenceMissing pieces
Benchmark levelpartially closesRCAEval packages microservice failure cases with metrics, logs, traces, root-cause service labels, and root-cause indicator labels.It evaluates diagnosis, not predictive state modeling or action-conditioned rollout.
Context interfaceadjacentRE2 and RE3 combine multiple telemetry surfaces from Online Boutique, Sock Shop, and Train Ticket.Needs explicit topology, deploy context, and event schemas as model inputs.
Causal structure, counterfactuals, and controlinsufficient evidenceFault injections are controlled benchmark events.There is no logged operator action/remediation channel or counterfactual control target.