LEMMA-RCA

Source

Core Claim

LEMMA-RCA is a large multi-modal multi-domain dataset collection for root cause analysis. It spans IT operations microservices and OT water treatment/distribution systems.

Dataset Notes

  • The four public dataset families are Product Review, Cloud Computing, SWaT, and WADI.
  • Product Review and Cloud Computing are the microservice-relevant subsets.
  • The website reports Product Review at 765G, 4 faults, and average 216 entities per fault.
  • The website reports Cloud Computing at 540G, 6 faults, and average 168 entities per fault.
  • The paper reports more than 100000 timestamps, millions of log-event records, fault timestamps, and root-cause entity labels.

Reported Baselines

The paper reports PC, Dynotears, C-LSTM, GOLEM, REASON, Nezha, MULAN, and CORAL. Repository text also mentions six baseline methods in places, so the paper should be preferred for the count.

Why It Matters

LEMMA-RCA is the largest multi-domain RCA collection in this comparison. It is especially relevant when testing whether a method transfers across IT and OT operations and across single-modal versus multi-modal RCA settings.

Gotchas

  • The benchmark is entity-centric and causal-graph-oriented, but it is not packaged as one ChronoGraph-style topology plus temporal edge-feature tensor.
  • License notes conflict: website/README License text says CC BY-ND 4.0, while Hugging Face metadata and one README paragraph say CC BY-NC 4.0.
  • Fault scenarios are diagnostic events, not logged operator actions.

Foundation TSFM Relevance

Agenda slotVerdictEvidenceMissing pieces
Observability RCA benchmarkspartially closesThe raw dataset snapshot covers IT microservices and OT water systems with metrics, logs, traces, fault timestamps, and root-cause entity labels.RCA labels do not by themselves define forecasting, remediation, or control rollouts.
Native multivariate encoding and high-channel scalingadjacentEntity-centric telemetry and graph/causal baselines make multi-entity system structure part of the task.Not packaged as one topology-plus-temporal-edge-feature schema for foundation TSFM training.
Context interfaceadjacentSystem entities, graph structure, logs, traces, metrics, and fault metadata provide operational context around the time series.Needs a standardized context schema rather than benchmark-specific artifacts.
Causal and control modelingwarningFault scenarios are diagnostic events, not logged operator actions or remediation choices.Needs interventions, candidate fixes, and observed outcomes under those actions.