Action-Conditioned Time-Series Datasets

Scope

Terminology on this page follows Terminology.

This page compares non-vision-heavy datasets that can support world models with actions or interventions. Here, “time series” is broad: it includes regular sensor streams, irregular medical/event logs, control trajectories, recommender decision logs, tutoring interaction sequences, graph telemetry, and any ordered sequence where a model can condition on an action or intervention at time t to predict later observations.

The strongest candidates expose a transition-like channel: observation_t, action_t, optional reward_t, and observation_{t+1}. Weaker candidates expose logged decisions or treatments but have thin next-state observations or strong observational confounding.

For datasets used to train policies in imagined rollouts, the s,a,r,s' contract should also record whether transition data and reward annotations come from different streams, with separate costs, noise, and bias risks. On Training in Imagination is the local source for this data-economics split.

Latent Action Models are complementary to this dataset view. They infer action-like codes from observation transitions when the action channel is missing, but they do not turn a dataset into a typed action-conditioned dataset until those codes are aligned with real actions, control inputs, interventions, and outcomes.

This page intentionally excludes vision-heavy trajectory datasets. Those datasets also contain time series, and they can be very important for action-conditioned world models, but they require image/video encoders and belong in a separate embodied/visual world-model comparison. Excluded examples include V-D4RL, MineRL, Atari DQN Replay, Open X-Embodiment, DROID, BridgeData V2, RoboNet, CALVIN, RoboTurk, and RoAM. For that embodied/visual branch, use Robotics Time-Series Modeling and World Model for Robot Learning Survey rather than expanding this non-vision dataset table.

Most remaining datasets are still not pure univariate time series. The Modalities Needed column lists the non-temporal modalities or structured data types that a training pipeline must understand in addition to temporal order.

Selection Tiers

Tier 1: direct world-model datasets provide explicit sequential observations and actions and are immediately usable for action-conditioned dynamics learning.
Tier 2: longitudinal intervention datasets provide real interventions/treatments over time but require careful causal handling because actions are often confounded by state.
Tier 3: logged action-response datasets provide actions and rewards/outcomes, but temporal state dynamics are weaker than in trajectory datasets.
Near-miss: passive time-series datasets are useful for passive world-model pretraining or forecasting, but do not expose controllable actions.

Offline RL And Numeric Control Trajectories

Dataset	Time-Series Structure	Modalities Needed	Action Channel	World-Model Fit	Caveat
Minari D4RL	Episodic offline RL transitions across MuJoCo, AntMaze, Adroit, Kitchen, and related tasks	Numeric state vectors; rewards; terminals; task IDs for mixed datasets; sometimes goal/state annotations	Environment control action at each step	Tier 1; clean `s,a,r,s'` benchmark for latent/state dynamics	Some tasks are benchmark-specific, and the page excludes visual variants
RL Unplugged	Replayed transitions from multiple RL domains	Numeric states for control tasks; rewards/discounts; action labels; domain metadata	Discrete or continuous environment actions	Tier 1 for non-visual subsets; diverse offline RL source for action-conditioned dynamics	Some RL Unplugged domains are visual and SHOULD be filtered out for this non-vision page
NeoRL-2	Offline RL transitions across seven near-real-world simulated tasks	Numeric state vectors; continuous actions; rewards; terminals; task IDs; delay/exogenous/safety context by task	Continuous action at each step	Tier 1; clean `observation_t, action_t, reward_t, observation_{t+1}` interface with practical constraints	Simulated tasks; artifact users should pin exact HF configs and resolve license metadata mismatch

Energy And Industrial Control

Dataset	Time-Series Structure	Modalities Needed	Action / Control-Input Channel	World-Model Fit	Caveat
CityLearn	Simulator-backed building-energy trajectories over virtual districts	Numeric building, storage, device, weather, pricing, carbon-intensity, and schema context	Continuous storage charge/discharge and device power controls	Tier 1 environment; strong non-vision control testbed for energy demand response	Not a single immutable dataset payload; schema, version, reward/cost, and source-data provenance must be pinned
Grid2Op	Simulator-backed power-grid operation episodes with graph-structured observations, forecasts, contingencies, cooldowns, and safety constraints	Numeric grid observations; graph topology; asset limits; scenario context; exogenous load, generation, weather, maintenance, and line-disconnection events	Topology actions, redispatching, curtailment, and storage controls	Tier 1 environment; strong non-vision graph/time-series control testbed for energy-grid operations, now with RL2Grid and MARL2Grid-TR benchmark anchors	Challenge ecosystem rather than one fixed dataset; Grid2Op version, backend, chronics, action masks, reward/cost, train/test scenarios, simulator access, simulator-call budget, proxy-versus-physical-simulator fidelity, and expert-action baselines must be pinned
Tennessee Eastman Process Simulation Data	Regular 3-minute industrial process runs with normal and faulty regimes	Measured process variables; manipulated variables; run IDs; fault labels; sample indices	Manipulated variables are control-input-like channels; fault injections are benchmark disturbances	Useful industrial dynamics and anomaly benchmark with control-input conditioning potential	Not clean offline RL; no rewards, remediation actions, or counterfactual intervention protocol

Healthcare And Physiology

Dataset	Time-Series Structure	Modalities Needed	Action / Intervention Channel	World-Model Fit	Caveat
MIMIC-IV	Irregular hospital/ICU EHR time series	Numeric vitals/labs; categorical codes; medication/procedure tables; demographics; clinical notes if used	Medications, fluids, procedures, ventilation-related events, orders	Tier 2; strong for treatment-conditioned patient dynamics	Observational, confounded, credentialed access
eICU-CRD	Multi-center ICU longitudinal records	Numeric vitals/labs; categorical diagnoses/treatments; medication/infusion records; care-plan tables	Medications, infusion drugs, treatments, procedures	Tier 2; strong multi-hospital treatment-response source	Heterogeneous schema and confounding
HiRID	High-resolution ICU records	High-frequency numeric physiology; labs; medication/event tables; patient metadata	ICU treatments, medications, interventions, clinical events	Tier 2; good for high-frequency physiology dynamics	Access and preprocessing complexity
AmsterdamUMCdb	European ICU observation/event series	Numeric vitals/labs; medication/infusion tables; device/ventilation records; demographics	Medications, fluids, feeding, transfusions, procedures	Tier 2; strong ICU dynamics dataset	Observational and access-controlled
OhioT1DM	Continuous glucose and patient event streams	Continuous glucose monitor values; insulin logs; meal/carbohydrate records; exercise/sleep/stress event features	Insulin, meals/carbs, exercise, sleep, stress	Tier 1/2; small but clean physiology-control source	Small participant count and per-person variability
HeartSteps	Participant decision points and activity outcomes over weeks	Mobile-sensing/context features; step-count/activity outcomes; survey/context variables; intervention messages	Micro-randomized activity suggestions	Tier 2; cleaner causal interventions than routine care logs	Small behavioral domain

Recommender, Bandit, And Marketing Logs

Dataset	Time-Series Structure	Modalities Needed	Action / Intervention Channel	World-Model Fit	Caveat
KuaiRand	Sequential user-video interactions with random exposure	User IDs/features; item/video IDs and metadata; categorical feedback events; watch/click/like signals; timestamps	Video/item exposure and feedback signals	Tier 1/3; strong for user-response dynamics without requiring video pixels	State is user-behavioral, not physical-world dynamics
Open Bandit Dataset	Logged fashion recommendation decisions	User/context features; item/category IDs; logged propensities; clicks/conversions/rewards	Recommended item/action, reward, propensity	Tier 3; strong for off-policy action-response modeling	Thin next-state dynamics
Webscope R6 line	News recommendation decision logs	User/context features; article IDs/features; click rewards; randomized serving logs	Article action and click reward under randomized traffic	Tier 3; classic contextual bandit benchmark	Weak sequential state compared with world-model trajectories
Criteo Uplift	Marketing treatment records	User/ad context features; treatment flag; visit/conversion outcomes	Binary treatment/control with visit/conversion outcomes	Tier 3; useful for treatment-effect modeling	Mostly one-step, not rich temporal dynamics

Education And Tutoring Logs

Dataset	Time-Series Structure	Modalities Needed	Action / Intervention Channel	World-Model Fit	Caveat
EdNet	Large-scale student activity sequences	Student IDs; question/skill IDs; correctness; timestamps; lecture/purchase/platform event categories	Question solving, lecture consumption, purchases, platform events	Tier 2/3; useful for student-state dynamics	Actions mix student behavior and platform interventions
ASSISTments 2009-2010	Student problem-solving sequences	Student/problem/skill IDs; correctness; hint counts; attempt metadata; timestamps	Attempts, hints, first-action type, problem assignments	Tier 2/3; useful for knowledge tracing and pedagogical dynamics	Action granularity varies by release
KDD Cup 2010	Cognitive Tutor student-step logs	Student/problem/step/knowledge-component IDs; correctness; opportunity counts; hint/attempt features	Responses, opportunities, problem steps, hint/attempt-related fields	Tier 2/3; useful for educational sequence modeling	Not a clean controllable intervention benchmark
PSLC DataShop	Repository of many learning-science event logs	Dataset-specific student/tutor event tables; skill/problem IDs; correctness; hints; timestamps	Student actions, tutor responses, hints, instructional events	Tier 2/3; broad source for education action-time-series	Requires dataset-by-dataset curation

Causal And Interventional Validation

Dataset	Time-Series Structure	Modalities Needed	Action / Intervention Channel	World-Model Fit	Caveat
CausalWorld	Simulated robot manipulation episodes	Numeric simulator state; robot/object poses; task/intervention metadata; optional visual observations SHOULD be ignored for this page	Robot actions plus causal/environment interventions	Tier 1/validation; good for causal generalization under interventions	Benchmark/environment more than fixed real-world dataset
Causal Chambers	Real physical-system measurements and interventional data	Numeric sensor streams; actuator/control settings; known causal graphs; experiment metadata	Controlled interventions over physical variables	Tier 2/validation; useful for intervention fidelity tests	Not always a sequential control dataset in RL format

Passive Time-Series Near-Miss

Dataset	Time-Series Structure	Modalities Needed	Action Channel	World-Model Fit	Caveat
ChronoGraph	Graph-structured multivariate microservice telemetry over time	Graph topology; node metrics; edge metrics; incident/anomaly labels; service/dependency metadata	No explicit controllable action channel in the paper	Useful for passive graph/time-series world-model pretraining	Incident windows are labels/exogenous shocks, not operator interventions
TelecomTS	5G observability KPI windows with anomaly labels, root-cause labels, natural-language descriptions, troubleshooting tickets, and Q&A	Numeric/categorical KPIs; telecom labels; text descriptions/tickets; Q&A	No operator-action channel; controlled jamming and synthetic anomaly injections are events/benchmark conditions	Useful for passive/multimodal observability pretraining and diagnosis evaluation	Lab/testbed data; synthetic anomalies and generated tickets need artifact checks

Modality Takeaways

Mostly numeric temporal control: D4RL, OhioT1DM, CausalWorld, Causal Chambers, and non-visual parts of RL Unplugged can be approached with multivariate time-series models.
Simulator-backed energy and industrial control: CityLearn, Grid2Op, NeoRL-2, and Tennessee Eastman Process Simulation Data are useful when the goal is explicit control-input conditioning outside vision. CityLearn and NeoRL-2 expose cleaner action/transition interfaces; L2RPN/Grid2Op adds graph topology, combinatorial topology actions, multi-agent benchmark variants, learned action-ranking/risk-surrogate papers, and safety-critical contingencies; Tennessee Eastman needs careful separation of measured variables, manipulated variables, fault labels, and run boundaries.
Irregular event/EHR data: MIMIC-IV, eICU-CRD, HiRID, and AmsterdamUMCdb require event-table modeling, coding systems, missingness handling, and often irregular-time encodings.
Structured relational data: KuaiRand, Open Bandit Dataset, Yahoo! contextual bandit, and Criteo Uplift require user-item/action-response structure rather than image/video understanding.
Education event logs: EdNet, ASSISTments, KDD Cup 2010, and PSLC DataShop require student/problem/skill identifiers, correctness, hints, and timestamps.
Graph and telecom observability: ChronoGraph requires graph topology plus temporal node/edge metrics, while TelecomTS requires scale-preserving KPI streams plus operational text. Both remain passive unless action logs are joined.
Action-free trajectories with hidden controls: Genie shows that a latent action model can recover action-like codes from image/video transitions, but those codes need alignment before they count as typed actions or control inputs. OTF-LAM sharpens the caveat: inferred codes can be mixed observed effects from the agent, camera, distractors, or background, so factorization and typed-action alignment are both needed before treating them as controls.

Practical Recommendations

For a first non-vision action-conditioned world-model baseline, start with D4RL, NeoRL-2, non-visual RL Unplugged tasks, CityLearn, Grid2Op, or OhioT1DM, depending on whether the desired domain is clean control, realistic offline RL constraints, building energy management, graph-structured power-grid operation, or physiology.
For industrial process monitoring, Tennessee Eastman Process Simulation Data is useful for manipulated-variable-conditioned dynamics and fault regimes, but it should not be evaluated as if it had rewards or logged remediation actions.
For real treatment/intervention modeling, use MIMIC-IV, eICU-CRD, HiRID, AmsterdamUMCdb, OhioT1DM, and HeartSteps, but model confounding explicitly.
For user-response and logged decision modeling, KuaiRand is the strongest sequential candidate; Open Bandit Dataset, Yahoo! contextual bandit, and Criteo Uplift are better treated as contextual action-response datasets.
ChronoGraph and TelecomTS should stay in the passive/near-miss bucket unless external deployment, remediation, autoscaling, rollback, or operator-action logs are joined to them.

Relation To Foundation TSFM Agenda

This page supports the Foundation Time-Series Model Research Agenda at the dataset-interface layer: it identifies which corpora can test action-conditioned rollout and which are only passive pretraining or diagnosis sources.

Agenda slot	Verdict	Evidence	Missing pieces
Control and counterfactuals	partially closes	Separates clean state-action trajectories, longitudinal interventions, logged action-response data, and passive near-misses.	Many real-world sources are confounded, one-step, or missing next-state dynamics.
Time representation and event streams	adjacent	Healthcare, education, recommender, and observability entries expose irregular events, treatments, hints, exposures, and telemetry.	Needs a shared event/action schema and benchmark protocol across domains.

Observability and recommender logs are especially relevant to the digital-world robot north star because they show non-robotic systems with observations and possible intervention surfaces. Public datasets still rarely join telemetry with typed operator actions and outcomes.

Open Questions

Which non-vision dataset family should anchor Alex’s first action-conditioned world-model experiment: clean RL transitions, irregular healthcare interventions, recommender logs, education logs, or graph telemetry?
Should the first industrial-control experiment use a clean simulator interface such as NeoRL-2 or CityLearn, or a fault-monitoring source such as Tennessee Eastman where manipulated variables need additional preprocessing into control inputs?
Should passive datasets like ChronoGraph be included in a separate pretraining pool for representation learning before action-conditioned finetuning?
How should the wiki distinguish controllable actions from exogenous events, treatments, platform decisions, and observed human behavior?
For real operator logs, when is a topology change a clean remedial-action label rather than maintenance, testing, or an unrelated maneuver, and what metadata or counterfactual filtering is required before using it as action-conditioned training data?
How should action-conditioned datasets record reward-source provenance, label cost, fidelity/noise assumptions, and bias audits alongside transition tuples?
Which non-vision modality stack should be prioritized first: multivariate time series, irregular EHR event streams, recommender user-item events, education event logs, or graph-temporal observability data?

Alex Open Research Wiki

Explorer

Action-Conditioned Time-Series Datasets

Action-Conditioned Time-Series Datasets

Scope

Selection Tiers

Offline RL And Numeric Control Trajectories

Energy And Industrial Control

Healthcare And Physiology

Recommender, Bandit, And Marketing Logs

Education And Tutoring Logs

Causal And Interventional Validation

Passive Time-Series Near-Miss

Modality Takeaways

Practical Recommendations

Relation To Foundation TSFM Agenda

Open Questions

Graph View

Table of Contents

Backlinks

Alex Open Research Wiki

Explorer

Action-Conditioned Time-Series Datasets

Action-Conditioned Time-Series Datasets

Scope

Selection Tiers

Offline RL And Numeric Control Trajectories

Energy And Industrial Control

Healthcare And Physiology

Recommender, Bandit, And Marketing Logs

Education And Tutoring Logs

Causal And Interventional Validation

Passive Time-Series Near-Miss

Modality Takeaways

Practical Recommendations

Relation To Foundation TSFM Agenda

Open Questions

Related Pages

Graph View

Table of Contents

Backlinks