# Additional Tennessee Eastman Process Simulation Data

Canonical source: <https://doi.org/10.7910/DVN/6C3JR1>
Provided walkthrough: <https://medium.com/@mrunal68/tennessee-eastman-process-simulation-data-for-anomaly-detection-evaluation-d719dc133a7f>
Original benchmark paper: <https://doi.org/10.1016/0098-1354(93)80018-I>
Wiki source: [Additional Tennessee Eastman Process Simulation Data](../../wiki/sources/tennessee-eastman-process-2017.md)

## Dataset Type

The Rieth et al. Tennessee Eastman Process data is synthetic multivariate industrial process-control time-series data for anomaly detection and fault diagnosis. It extends the classic Tennessee Eastman Process benchmark introduced by Downs and Vogel in 1993.

## Temporal Structure

The provided Medium walkthrough describes four RData objects: `fault_free_training`, `fault_free_testing`, `faulty_training`, and `faulty_testing`. Each dataframe contains 55 columns: `faultNumber`, `simulationRun`, `sample`, and 52 process variables.

The process variables are sampled every 3 minutes. Training runs have 500 samples, corresponding to 25 hours. Testing runs have 960 samples, corresponding to 48 hours. Faults are introduced after 1 hour in faulty training runs and after 8 hours in faulty testing runs.

## Process Variables

The 52 process variables include measured variables and manipulated variables from the Tennessee Eastman Process simulator. Downstream tooling often names these as `xmeas` measured variables and `xmv` manipulated variables.

## Actions Or Interventions

The manipulated variables are control-input-like channels and can be useful for action-conditioned next-state modeling. They should not be conflated with fault labels. Fault numbers and fault injections are benchmark conditions or exogenous disturbance events, not logged operator remediation actions.

## Reported Composition

The dataset includes normal operation (`faultNumber` 0) and 20 faulty classes. The Medium walkthrough reports balanced train/test fault samples and total loaded train/test sizes of 5250000 and 10080000 rows before its later subsampling workflow.

FDDBenchmark records its `rieth_tep` variant as 52 sensors, 21 states including normal plus 20 faults, run lengths of 500 and 960, and about 1.84 GB.

## Suitability Note

This is a strong industrial process-monitoring source for multivariate dynamics, abnormal regimes, and control-input-like channels. For world-model work, it is best treated as a process-dynamics and anomaly benchmark with manipulated-variable conditioning, not as a clean `s,a,r,s'` offline RL dataset.

## Access And License Notes

The canonical DOI points to Harvard Dataverse. The direct Dataverse landing page required JavaScript verification in this environment, so license/access details should be rechecked before reuse. This knowledge base records metadata only and does not mirror RData payloads.
