Action-Conditioned Time-Series Datasets

Scope

Terminology on this page follows Terminology.

This page compares non-vision-heavy datasets that can support world models with actions or interventions. Here, “time series” is broad: it includes regular sensor streams, irregular medical/event logs, control trajectories, recommender decision logs, tutoring interaction sequences, graph telemetry, and any ordered sequence where a model can condition on an action or intervention at time t to predict later observations.

The strongest candidates expose a transition-like channel: observation_t, action_t, optional reward_t, and observation_{t+1}. Weaker candidates expose logged decisions or treatments but have thin next-state observations or strong observational confounding.

For datasets used to train policies in imagined rollouts, the s,a,r,s' contract should also record whether transition data and reward annotations come from different streams, with separate costs, noise, and bias risks. On Training in Imagination is the local source for this data-economics split.

Latent Action Models are complementary to this dataset view. They infer action-like codes from observation transitions when the action channel is missing, but they do not turn a dataset into a typed action-conditioned dataset until those codes are aligned with real actions, control inputs, interventions, and outcomes.

This page intentionally excludes vision-heavy trajectory datasets. Those datasets also contain time series, and they can be very important for action-conditioned world models, but they require image/video encoders and belong in a separate embodied/visual world-model comparison. Excluded examples include V-D4RL, MineRL, Atari DQN Replay, Open X-Embodiment, DROID, BridgeData V2, RoboNet, CALVIN, RoboTurk, and RoAM. For that embodied/visual branch, use Robotics Time-Series Modeling and World Model for Robot Learning Survey rather than expanding this non-vision dataset table.

Most remaining datasets are still not pure univariate time series. The Modalities Needed column lists the non-temporal modalities or structured data types that a training pipeline must understand in addition to temporal order.

Selection Tiers

  • Tier 1: direct world-model datasets provide explicit sequential observations and actions and are immediately usable for action-conditioned dynamics learning.
  • Tier 2: longitudinal intervention datasets provide real interventions/treatments over time but require careful causal handling because actions are often confounded by state.
  • Tier 3: logged action-response datasets provide actions and rewards/outcomes, but temporal state dynamics are weaker than in trajectory datasets.
  • Near-miss: passive time-series datasets are useful for passive world-model pretraining or forecasting, but do not expose controllable actions.

Offline RL And Numeric Control Trajectories

DatasetTime-Series StructureModalities NeededAction ChannelWorld-Model FitCaveat
Minari D4RLEpisodic offline RL transitions across MuJoCo, AntMaze, Adroit, Kitchen, and related tasksNumeric state vectors; rewards; terminals; task IDs for mixed datasets; sometimes goal/state annotationsEnvironment control action at each stepTier 1; clean s,a,r,s' benchmark for latent/state dynamicsSome tasks are benchmark-specific, and the page excludes visual variants
RL UnpluggedReplayed transitions from multiple RL domainsNumeric states for control tasks; rewards/discounts; action labels; domain metadataDiscrete or continuous environment actionsTier 1 for non-visual subsets; diverse offline RL source for action-conditioned dynamicsSome RL Unplugged domains are visual and SHOULD be filtered out for this non-vision page

Healthcare And Physiology

DatasetTime-Series StructureModalities NeededAction / Intervention ChannelWorld-Model FitCaveat
MIMIC-IVIrregular hospital/ICU EHR time seriesNumeric vitals/labs; categorical codes; medication/procedure tables; demographics; clinical notes if usedMedications, fluids, procedures, ventilation-related events, ordersTier 2; strong for treatment-conditioned patient dynamicsObservational, confounded, credentialed access
eICU-CRDMulti-center ICU longitudinal recordsNumeric vitals/labs; categorical diagnoses/treatments; medication/infusion records; care-plan tablesMedications, infusion drugs, treatments, proceduresTier 2; strong multi-hospital treatment-response sourceHeterogeneous schema and confounding
HiRIDHigh-resolution ICU recordsHigh-frequency numeric physiology; labs; medication/event tables; patient metadataICU treatments, medications, interventions, clinical eventsTier 2; good for high-frequency physiology dynamicsAccess and preprocessing complexity
AmsterdamUMCdbEuropean ICU observation/event seriesNumeric vitals/labs; medication/infusion tables; device/ventilation records; demographicsMedications, fluids, feeding, transfusions, proceduresTier 2; strong ICU dynamics datasetObservational and access-controlled
OhioT1DMContinuous glucose and patient event streamsContinuous glucose monitor values; insulin logs; meal/carbohydrate records; exercise/sleep/stress event featuresInsulin, meals/carbs, exercise, sleep, stressTier 1/2; small but clean physiology-control sourceSmall participant count and per-person variability
HeartStepsParticipant decision points and activity outcomes over weeksMobile-sensing/context features; step-count/activity outcomes; survey/context variables; intervention messagesMicro-randomized activity suggestionsTier 2; cleaner causal interventions than routine care logsSmall behavioral domain

Recommender, Bandit, And Marketing Logs

DatasetTime-Series StructureModalities NeededAction / Intervention ChannelWorld-Model FitCaveat
KuaiRandSequential user-video interactions with random exposureUser IDs/features; item/video IDs and metadata; categorical feedback events; watch/click/like signals; timestampsVideo/item exposure and feedback signalsTier 1/3; strong for user-response dynamics without requiring video pixelsState is user-behavioral, not physical-world dynamics
Open Bandit DatasetLogged fashion recommendation decisionsUser/context features; item/category IDs; logged propensities; clicks/conversions/rewardsRecommended item/action, reward, propensityTier 3; strong for off-policy action-response modelingThin next-state dynamics
Webscope R6 lineNews recommendation decision logsUser/context features; article IDs/features; click rewards; randomized serving logsArticle action and click reward under randomized trafficTier 3; classic contextual bandit benchmarkWeak sequential state compared with world-model trajectories
Criteo UpliftMarketing treatment recordsUser/ad context features; treatment flag; visit/conversion outcomesBinary treatment/control with visit/conversion outcomesTier 3; useful for treatment-effect modelingMostly one-step, not rich temporal dynamics

Education And Tutoring Logs

DatasetTime-Series StructureModalities NeededAction / Intervention ChannelWorld-Model FitCaveat
EdNetLarge-scale student activity sequencesStudent IDs; question/skill IDs; correctness; timestamps; lecture/purchase/platform event categoriesQuestion solving, lecture consumption, purchases, platform eventsTier 2/3; useful for student-state dynamicsActions mix student behavior and platform interventions
ASSISTments 2009-2010Student problem-solving sequencesStudent/problem/skill IDs; correctness; hint counts; attempt metadata; timestampsAttempts, hints, first-action type, problem assignmentsTier 2/3; useful for knowledge tracing and pedagogical dynamicsAction granularity varies by release
KDD Cup 2010Cognitive Tutor student-step logsStudent/problem/step/knowledge-component IDs; correctness; opportunity counts; hint/attempt featuresResponses, opportunities, problem steps, hint/attempt-related fieldsTier 2/3; useful for educational sequence modelingNot a clean controllable intervention benchmark
PSLC DataShopRepository of many learning-science event logsDataset-specific student/tutor event tables; skill/problem IDs; correctness; hints; timestampsStudent actions, tutor responses, hints, instructional eventsTier 2/3; broad source for education action-time-seriesRequires dataset-by-dataset curation

Causal And Interventional Validation

DatasetTime-Series StructureModalities NeededAction / Intervention ChannelWorld-Model FitCaveat
CausalWorldSimulated robot manipulation episodesNumeric simulator state; robot/object poses; task/intervention metadata; optional visual observations SHOULD be ignored for this pageRobot actions plus causal/environment interventionsTier 1/validation; good for causal generalization under interventionsBenchmark/environment more than fixed real-world dataset
Causal ChambersReal physical-system measurements and interventional dataNumeric sensor streams; actuator/control settings; known causal graphs; experiment metadataControlled interventions over physical variablesTier 2/validation; useful for intervention fidelity testsNot always a sequential control dataset in RL format

Passive Time-Series Near-Miss

DatasetTime-Series StructureModalities NeededAction ChannelWorld-Model FitCaveat
ChronoGraphGraph-structured multivariate microservice telemetry over timeGraph topology; node metrics; edge metrics; incident/anomaly labels; service/dependency metadataNo explicit controllable action channel in the paperUseful for passive graph/time-series world-model pretrainingIncident windows are labels/exogenous shocks, not operator interventions
TelecomTS5G observability KPI windows with anomaly labels, root-cause labels, natural-language descriptions, troubleshooting tickets, and Q&ANumeric/categorical KPIs; telecom labels; text descriptions/tickets; Q&ANo operator-action channel; controlled jamming and synthetic anomaly injections are events/benchmark conditionsUseful for passive/multimodal observability pretraining and diagnosis evaluationLab/testbed data; synthetic anomalies and generated tickets need artifact checks

Modality Takeaways

  • Mostly numeric temporal control: D4RL, OhioT1DM, CausalWorld, Causal Chambers, and non-visual parts of RL Unplugged can be approached with multivariate time-series models.
  • Irregular event/EHR data: MIMIC-IV, eICU-CRD, HiRID, and AmsterdamUMCdb require event-table modeling, coding systems, missingness handling, and often irregular-time encodings.
  • Structured relational data: KuaiRand, Open Bandit Dataset, Yahoo! contextual bandit, and Criteo Uplift require user-item/action-response structure rather than image/video understanding.
  • Education event logs: EdNet, ASSISTments, KDD Cup 2010, and PSLC DataShop require student/problem/skill identifiers, correctness, hints, and timestamps.
  • Graph and telecom observability: ChronoGraph requires graph topology plus temporal node/edge metrics, while TelecomTS requires scale-preserving KPI streams plus operational text. Both remain passive unless action logs are joined.
  • Action-free trajectories with hidden controls: Genie shows that a latent action model can recover action-like codes from image/video transitions, but those codes need alignment before they count as typed actions or control inputs.

Practical Recommendations

Relation To Foundation TSFM Agenda

This page supports the Foundation Time-Series Model Research Agenda at the dataset-interface layer: it identifies which corpora can test action-conditioned rollout and which are only passive pretraining or diagnosis sources.

Agenda slotVerdictEvidenceMissing pieces
Control and counterfactualspartially closesSeparates clean state-action trajectories, longitudinal interventions, logged action-response data, and passive near-misses.Many real-world sources are confounded, one-step, or missing next-state dynamics.
Time representation and event streamsadjacentHealthcare, education, recommender, and observability entries expose irregular events, treatments, hints, exposures, and telemetry.Needs a shared event/action schema and benchmark protocol across domains.

Observability and recommender logs are especially relevant to the digital-world robot north star because they show non-robotic systems with observations and possible intervention surfaces. Public datasets still rarely join telemetry with typed operator actions and outcomes.

Open Questions

  • Which non-vision dataset family should anchor Alex’s first action-conditioned world-model experiment: clean RL transitions, irregular healthcare interventions, recommender logs, education logs, or graph telemetry?
  • Should passive datasets like ChronoGraph be included in a separate pretraining pool for representation learning before action-conditioned finetuning?
  • How should the wiki distinguish controllable actions from exogenous events, treatments, platform decisions, and observed human behavior?
  • How should action-conditioned datasets record reward-source provenance, label cost, fidelity/noise assumptions, and bias audits alongside transition tuples?
  • Which non-vision modality stack should be prioritized first: multivariate time series, irregular EHR event streams, recommender user-item events, education event logs, or graph-temporal observability data?