OpenRCA
Source
- Dataset metadata snapshot: openrca-2025
- OpenReview: https://openreview.net/forum?id=M4qNIzQYpd
- Official GitHub: https://github.com/microsoft/OpenRCA
Core Claim
OpenRCA reframes microservice/software RCA as an LLM and agent benchmark. A model receives a natural-language query and must inspect telemetry to produce root-cause datetime, component, and reason.
Dataset Notes
- The OpenReview paper reports 335 failures from three enterprise software systems and more than 68 GB of telemetry.
- The GitHub README names the systems as Telecom, Bank, and Market.
- The telemetry directory contains logs, metrics, and traces under date-stamped folders.
- Inputs include KPI time series, dependency trace graphs, semi-structured logs, and natural-language queries.
Reported Baselines
OpenRCA introduces RCA-agent, which uses Python retrieval and analysis to avoid forcing all telemetry into the LLM context. The repository also includes standard, balanced, and oracle-style evaluation scripts.
Why It Matters
OpenRCA is the best fit in this group for evaluating LLM-agent investigation behavior over large telemetry, not for training a pure numeric graph time-series model.
Gotchas
- The scoring is strict: the answer must match all required root-cause elements.
- The README links telemetry through Google Drive and does not state a separate telemetry dataset license.
- OpenRCA is diagnostic; it does not provide operator remediation actions.
Foundation TSFM Relevance
| Agenda slot | Verdict | Evidence | Missing pieces |
|---|---|---|---|
| Benchmarks: what level of modeling is tested? | partially closes | Benchmarks LLM/agent RCA over logs, metrics, traces, dependency graphs, and natural-language queries for enterprise systems, close to the observability slice of the digital-world robot north star. | It is diagnostic and answer-scored, not a training corpus for action-conditioned system control. |
| Context interface | partially closes | Combines natural-language incident queries with KPI time series, trace graphs, semi-structured logs, and record metadata. | Context is consumed through an agent workflow, not a standardized TSFM input schema. |
| Control and counterfactuals | warning | Root-cause outputs identify datetime, component, and reason. | No logged remediation actions, intervention choices, or counterfactual rollout labels. |