Additional Tennessee Eastman Process Simulation Data

Source

Core Claim

The Rieth et al. 2017 Tennessee Eastman Process data is a synthetic multivariate industrial process-control benchmark for anomaly detection and fault diagnosis. It extends the classic Downs and Vogel Tennessee Eastman Process simulator with fault-free and faulty training/testing runs.

Dataset Notes

  • The provided walkthrough describes four RData objects: fault_free_training, fault_free_testing, faulty_training, and faulty_testing.
  • Each dataframe has faultNumber, simulationRun, sample, and 52 process-variable columns.
  • Process variables are sampled every 3 minutes.
  • Training runs have 500 samples over 25 hours; testing runs have 960 samples over 48 hours.
  • Fault labels include normal operation (faultNumber 0) and 20 fault classes.
  • Faults are introduced after 1 hour in faulty training runs and after 8 hours in faulty testing runs.

Action-Time-Series Notes

The 52 process variables include measured variables and manipulated variables. The manipulated variables are control-input-like channels and can be useful for action-conditioned next-state modeling.

The distinction matters:

measured variables + manipulated variables -> process state evolution
fault labels / fault injections -> benchmark condition or exogenous disturbance

The dataset should not be treated as a clean offline RL corpus because it does not expose policy rewards, candidate remediation actions, or operator decisions. It is better viewed as industrial process dynamics plus fault-regime labels.

Gotchas

  • The data is synthetic simulator output, not live plant data.
  • The canonical Dataverse page required JavaScript verification from this environment, so license/access details should be rechecked before reuse.
  • Fault injections are not logged operator actions.
  • For action-conditioned world-model experiments, preprocessing must separate measured observations, manipulated control inputs, labels, and run boundaries.

Foundation TSFM Relevance

Agenda slotVerdictEvidenceMissing pieces
Native multivariate encoding and high-channel scalingpartially closesProvides 52 coupled industrial process variables with normal and faulty regimes.Channel count is modest by HDTSF standards and no graph/topology context is exposed in the dataset artifact.
Causal structure, counterfactuals, and controladjacentIncludes manipulated variables that can be treated as control-input channels under careful preprocessing.No rewards, candidate interventions, remediation logs, or counterfactual rollout protocol.
Benchmarks: what level of modeling is tested?partially closesLong-running standard benchmark for industrial anomaly detection and fault diagnosis.The default task is fault detection/classification, not action-conditioned planning.
Time representation and irregular event streamsadjacentRegular 3-minute samples over run-indexed trajectories with known fault onset timing.Does not stress irregular sampling or heterogeneous event streams.