Additional Tennessee Eastman Process Simulation Data

Source

Dataset metadata snapshot: tennessee-eastman-process-2017
Metadata JSON: metadata.json
Canonical DOI / Harvard Dataverse: https://doi.org/10.7910/DVN/6C3JR1
Provided Medium walkthrough: https://medium.com/@mrunal68/tennessee-eastman-process-simulation-data-for-anomaly-detection-evaluation-d719dc133a7f
Original Tennessee Eastman Process benchmark paper DOI: https://doi.org/10.1016/0098-1354(93)80018-I
FDDBenchmark reference wrapper: https://github.com/AIRI-Institute/fddbenchmark

Core Claim

The Rieth et al. 2017 Tennessee Eastman Process data is a synthetic multivariate industrial process-control benchmark for anomaly detection and fault diagnosis. It extends the classic Downs and Vogel Tennessee Eastman Process simulator with fault-free and faulty training/testing runs.

Dataset Notes

The provided walkthrough describes four RData objects: fault_free_training, fault_free_testing, faulty_training, and faulty_testing.
Each dataframe has faultNumber, simulationRun, sample, and 52 process-variable columns.
Process variables are sampled every 3 minutes.
Training runs have 500 samples over 25 hours; testing runs have 960 samples over 48 hours.
Fault labels include normal operation (faultNumber 0) and 20 fault classes.
Faults are introduced after 1 hour in faulty training runs and after 8 hours in faulty testing runs.

Action-Time-Series Notes

The 52 process variables include measured variables and manipulated variables. The manipulated variables are control-input-like channels and can be useful for action-conditioned next-state modeling.

The distinction matters:

measured variables + manipulated variables -> process state evolution
fault labels / fault injections -> benchmark condition or exogenous disturbance

The dataset should not be treated as a clean offline RL corpus because it does not expose policy rewards, candidate remediation actions, or operator decisions. It is better viewed as industrial process dynamics plus fault-regime labels.

Gotchas

The data is synthetic simulator output, not live plant data.
The canonical Dataverse page required JavaScript verification from this environment, so license/access details should be rechecked before reuse.
Fault injections are not logged operator actions.
For action-conditioned world-model experiments, preprocessing must separate measured observations, manipulated control inputs, labels, and run boundaries.

Foundation TSFM Relevance

Agenda slot	Verdict	Evidence	Missing pieces
Native multivariate encoding and high-channel scaling	partially closes	Provides 52 coupled industrial process variables with normal and faulty regimes.	Channel count is modest by HDTSF standards and no graph/topology context is exposed in the dataset artifact.
Causal structure, counterfactuals, and control	adjacent	Includes manipulated variables that can be treated as control-input channels under careful preprocessing.	No rewards, candidate interventions, remediation logs, or counterfactual rollout protocol.
Benchmarks: what level of modeling is tested?	partially closes	Long-running standard benchmark for industrial anomaly detection and fault diagnosis.	The default task is fault detection/classification, not action-conditioned planning.
Time representation and irregular event streams	adjacent	Regular 3-minute samples over run-indexed trajectories with known fault onset timing.	Does not stress irregular sampling or heterogeneous event streams.

Alex Open Research Wiki

Explorer

Additional Tennessee Eastman Process Simulation Data

Additional Tennessee Eastman Process Simulation Data

Source

Core Claim

Dataset Notes

Action-Time-Series Notes

Gotchas

Foundation TSFM Relevance

Links Into The Wiki

Graph View

Table of Contents

Backlinks