Time-Series Generation

Summary

Time-series generation is not one task. The current TimeCraft batch separates at least seven interfaces:

Interface	Representative sources	Generated object	Main conditioning signal	Local interpretation
Cross-domain synthetic generation	TimeDP, TimeCraft	Fixed-window time-series samples	Few target-domain examples converted into prototype weights	Useful for low-resource synthetic data, but not action-conditioned.
Text-controlled generation	BRIDGE, T2S	Time-series samples	Natural-language descriptions, sometimes plus prototypes	A context interface for generation; evaluation must distinguish caption alignment from numeric utility.
Target-aware augmentation	TarDiff, OATS	Synthetic training samples	Downstream loss, influence scores, or valuable training samples	Shifts the objective from realism to downstream utility; needs leakage and overfitting audits.
Causal/interventional generation	CaTSG	Observational, interventional, and counterfactual samples	Causal conditions plus latent environment estimates	Closest TimeCraft branch to counterfactual modeling, but real-world counterfactual validation remains weak.
Irregular/continuous generation	Diff-MN	Continuous-time trajectories from irregular observations	Irregular observation context plus generated MoE-NCDE dynamics weights	Directly relevant to continuous latent-state modeling and arbitrary-time generation.
Forecast-generation via diffusion	MG-TSD, Sundial	Forecast sample paths	Numeric history plus denoising or flow objectives	Probabilistic forecasting, not unconditional synthetic data generation.
Reconstruction / missing-data infilling	SensorFM	Missing wearable sensor values or segments	Observed wearable window plus missingness mask	Passive reconstruction and metric recovery, not synthetic population generation or intervention rollout.
Financial market simulation	DiGA, MarS	Order-flow or market trajectories	Scenario targets, injected orders, matching rules, market state	World-model-adjacent because generated futures are used for what-if analysis and agent training.

The important axis is the conditioning contract. A generator conditioned on text, examples, downstream gradients, causal interventions, irregular observations, or candidate orders should not be evaluated as if it solved the same problem.

TimeCraft Lineage

TimeCraft is best read as a Microsoft Research framework and repository that packages several related generation lines:

TimeDP supplies the prototype/domain-prompt branch.
BRIDGE adds text-to-series data preparation and hybrid text/prototype conditioning.
TarDiff adds task-aware diffusion guidance through influence functions.
CaTSG adds observational, interventional, and counterfactual time-series generation.
OATS makes synthetic generation part of the TSFM training loop.
Diff-MN targets irregular-to-continuous generation through diffusion-parameterized MoE-NCDE dynamics.

That lineage matters because it moves from generate realistic samples toward generate samples for a purpose: match a target domain, satisfy a text description, improve a downstream model, respect causal interventions, support TSFM pretraining, or produce a continuous trajectory.

Evaluation Boundary

Generation papers often report MMD, KL, discriminative score, predictive score, J-FTSD, human preference, downstream AUROC/AUPRC, or trading-agent utility. These metrics answer different questions:

Fidelity metrics test whether generated samples resemble a reference distribution.
Text-alignment and human-ranking metrics test whether generated samples match a condition.
Downstream utility metrics test whether synthetic samples improve another model.
Causal metrics test interventional or counterfactual behavior, but real-world counterfactual labels are usually absent.
Market-simulation metrics test stylized facts, market impact, and agent-training usefulness.

For this wiki, a time-series generator becomes world-model-relevant only when the generated future remains conditioned on state, context, and explicit actions, control inputs, interventions, or candidate orders. Most TimeCraft branches are still passive or condition-controlled generators rather than full action-conditioned world models.

DMax is not a time-series generator, but it adds a decoding caveat for diffusion-style generation: parallel generation should keep tentative positions revisable through self-correction and soft intermediate states until convergence or confidence justifies commitment. A TSFM analogue would need numeric sample-path, event-stream, and action-conditioned rollout tests under matched wall-clock budgets before language TPF gains count as generation evidence.

iLLaDA adds the upstream training-scale counterpart: masked diffusion language models can now be trained from scratch at 8B scale with 12T pre-training tokens and variable-length block generation. For time-series generation, this is a reason to track diffusion sequence models seriously, but not a reason to count language benchmark gains as numeric-horizon fidelity, calibrated uncertainty, or action-conditioned rollout evidence.

The Flexibility Trap adds an order-of-commitment warning. A confidence-driven parallel generator can improve one-sample local consistency while narrowing Pass@ $k$ proposal coverage by deferring uncertain forks. A time-series analogue should compare strict temporal training order, entropy-first decision points, and revisable parallel future blocks under matched compute, while measuring rare-trajectory coverage, calibration, numeric fidelity, and downstream control utility.

Diffusion Consistency RMT adds a cross-training-run reproducibility diagnostic. Under deterministic matched-noise sampling, disjoint image-training splits and different architectures can produce aligned outputs because mean/covariance structure is stable across splits; finite data simultaneously overshrink lower-variance directions. For time-series generation, the useful experiment is paired base-noise rollout across independent entity/episode splits, reported by frequency band, channel or latent-state direction, regime, and horizon. Same-noise consistency should stay separate from fidelity, diversity, calibration, memorization, and action-conditioned utility because all runs can agree on the same biased or over-smoothed future.

Irregular and continuous generation should report whether irregularity is naturally observed or simulated by dropping points from regular series. Diff-MN tests random dropping at several observation rates, so transfer to real sampling policies remains open.

Bias and spurious-correlation metrics should be separated from fidelity metrics. InvDiff is mostly text-to-image evidence, with a limited AusElec/TimeGrad OOD forecasting experiment; use it as a shortcut-auditing pattern rather than broad time-series generation evidence.

Relation To Foundation TSFM Agenda

Time-series generation maps most directly to the generation/editing, context interface, dense numeric fidelity, causal/counterfactual, and benchmark slots in the Foundation Time-Series Model Research Agenda. The TimeCraft batch strengthens the generation/editing branch, but it also shows why the agenda must separate observational generation, text-controlled generation, utility-guided augmentation, and intervention-aware rollout.

Open Questions

Which synthetic time-series generators improve downstream models under strict train/validation/test separation rather than by tuning to the evaluation set?
Can text-controlled generation use operationally meaningful context such as incidents, exogenous variables, and constraints rather than only morphology captions?
Can CaTSG-style causal generation scale beyond predefined SCMs and synthetic counterfactual labels?
Can Diff-MN-style continuous generation become a reusable latent-state interface for irregular clinical, industrial, or observability data?
Which generation metrics predict utility for forecasting, anomaly detection, representation learning, and action-conditioned planning?
For market simulators, which combination of stylized-fact fidelity, scenario-control error, market-impact validity, and downstream trading-agent transfer predicts real utility?
How should time-series generators define invariant temporal features so debiasing removes shortcut dependence without erasing rare regimes or meaningful domain shifts?
Can training expose decision-critical timestamps or action consequences sequentially while inference generates low-entropy horizon regions in parallel without future-target leakage?
Can disjoint-split time-series generators preserve matched-noise rollout identity without erasing rare regimes, weak channels, or lower-variance decision-relevant state?

Alex Open Research Wiki

Explorer

Time-Series Generation

Time-Series Generation

Summary

TimeCraft Lineage

Evaluation Boundary

Relation To Foundation TSFM Agenda

Open Questions

Graph View

Table of Contents

Backlinks

Alex Open Research Wiki

Explorer

Time-Series Generation

Time-Series Generation

Summary

TimeCraft Lineage

Evaluation Boundary

Relation To Foundation TSFM Agenda

Open Questions

Related Pages

Graph View

Table of Contents

Backlinks