Additional Tennessee Eastman Process Simulation Data
Source
- Dataset metadata snapshot: tennessee-eastman-process-2017
- Metadata JSON: metadata.json
- Canonical DOI / Harvard Dataverse: https://doi.org/10.7910/DVN/6C3JR1
- Provided Medium walkthrough: https://medium.com/@mrunal68/tennessee-eastman-process-simulation-data-for-anomaly-detection-evaluation-d719dc133a7f
- Original Tennessee Eastman Process benchmark paper DOI: https://doi.org/10.1016/0098-1354(93)80018-I
- FDDBenchmark reference wrapper: https://github.com/AIRI-Institute/fddbenchmark
Core Claim
The Rieth et al. 2017 Tennessee Eastman Process data is a synthetic multivariate industrial process-control benchmark for anomaly detection and fault diagnosis. It extends the classic Downs and Vogel Tennessee Eastman Process simulator with fault-free and faulty training/testing runs.
Dataset Notes
- The provided walkthrough describes four RData objects:
fault_free_training,fault_free_testing,faulty_training, andfaulty_testing. - Each dataframe has
faultNumber,simulationRun,sample, and 52 process-variable columns. - Process variables are sampled every 3 minutes.
- Training runs have 500 samples over 25 hours; testing runs have 960 samples over 48 hours.
- Fault labels include normal operation (
faultNumber0) and 20 fault classes. - Faults are introduced after 1 hour in faulty training runs and after 8 hours in faulty testing runs.
Action-Time-Series Notes
The 52 process variables include measured variables and manipulated variables. The manipulated variables are control-input-like channels and can be useful for action-conditioned next-state modeling.
The distinction matters:
measured variables + manipulated variables -> process state evolution
fault labels / fault injections -> benchmark condition or exogenous disturbanceThe dataset should not be treated as a clean offline RL corpus because it does not expose policy rewards, candidate remediation actions, or operator decisions. It is better viewed as industrial process dynamics plus fault-regime labels.
Gotchas
- The data is synthetic simulator output, not live plant data.
- The canonical Dataverse page required JavaScript verification from this environment, so license/access details should be rechecked before reuse.
- Fault injections are not logged operator actions.
- For action-conditioned world-model experiments, preprocessing must separate measured observations, manipulated control inputs, labels, and run boundaries.
Foundation TSFM Relevance
| Agenda slot | Verdict | Evidence | Missing pieces |
|---|---|---|---|
| Native multivariate encoding and high-channel scaling | partially closes | Provides 52 coupled industrial process variables with normal and faulty regimes. | Channel count is modest by HDTSF standards and no graph/topology context is exposed in the dataset artifact. |
| Causal structure, counterfactuals, and control | adjacent | Includes manipulated variables that can be treated as control-input channels under careful preprocessing. | No rewards, candidate interventions, remediation logs, or counterfactual rollout protocol. |
| Benchmarks: what level of modeling is tested? | partially closes | Long-running standard benchmark for industrial anomaly detection and fault diagnosis. | The default task is fault detection/classification, not action-conditioned planning. |
| Time representation and irregular event streams | adjacent | Regular 3-minute samples over run-indexed trajectories with known fault onset timing. | Does not stress irregular sampling or heterogeneous event streams. |