Synthetic Data For Time Series

Summary

Synthetic data is used in this corpus for several different jobs: scaling pretraining volume, creating label coverage for classification, simulating causal or template structure, teaching covariate and grouped-forecasting behavior, fitting learned inference priors, bootstrapping annotation layers over real data, and aligning models to reasoning or generation tasks. These are related, but they are not one method.

What The Wiki Currently Believes

Data-Volume Scaling

TimesFM, Timer, MOMENT, Sundial, Kairos, TiRex, Tiny Time Mixers, and Reverso all make pretraining-corpus design central to zero-shot or few-shot transfer. In this use, synthetic data usually fills underrepresented frequencies, seasonalities, irregularities, spikes, or long-horizon regimes that are sparse in public real data.

For Sundial, the synthetic component is only 0.05% of TimeBench and is described as pattern-diversity support; the main claim is trillion-point mixed real-world corpus scale rather than synthetic-primary pretraining.

Label And Classification Coverage

CauKer uses synthetic causally coherent time series for classification TSFM pretraining, and MantisV2 uses CauKer-style synthetic classification data plus test-time strategies to close zero-shot gaps. This is a label-coverage story: the synthetic process supplies many labeled classification tasks that real archives cannot provide at the same scale.

Iterative Label Bootstrapping

Florence-2 is not a time-series paper, but it is an important data-engine analogy. It shows a real-observation plus generated-annotation path: start with a first annotation pass, train a model, use the model to improve and extend the annotations, filter the results, and repeat. For time series, this pattern targets the labeled-dataset bottleneck directly: real multivariate time series and event streams may be abundant, while labels for regimes, anomalies, events, or temporal segments are scarce.

Causal And Template Generation

Causal or template generators appear when a paper wants controllable structure rather than only more samples. CauKer composes Gaussian-process kernels and structural causal mechanisms; TempoPFN combines ForecastPFN-style components, KernelSynth, CauKer-style causal structure, spike and regime-switching generators, and augmentation cascades; Reverso uses Gaussian-process, spike, trapezoidal, trend, seasonality, and irregularity sequences.

Aionoscope adds a diagnostic-generator branch rather than a pretraining-data branch. It separates process generation from observation rendering to emit exact categorical and dense latent-state labels, so the synthetic data is used to debug representation accessibility rather than to inflate training volume or claim real-domain transfer.

Covariate And Grouped Forecasting Behavior

Chronos-2 uses synthetic multivariate and covariate-informed examples to teach grouped forecasting behavior across related series, targets, past-only covariates, known future covariates, and categorical covariates. This is different from pure univariate data scaling because the synthetic data must teach how variables relate inside a forecasting context.

PFN-Style Learned Inference Priors

TabPFN-v2, TabPFN-3, and TabICL are static tabular-data references, but they are important because they learn in-context inference from synthetic structural-causal tabular tasks. TabPFN-3 also reports TabPFN-TS-3, a specialized time-series checkpoint trained through the TabPFN ecosystem and evaluated on fev-bench. TempoPFN is the local open time-series analogue: it asks whether a learned inference prior trained only on synthetic temporal generators can become a zero-shot forecaster. See Tabular Foundation Models for the static-tabular side.

Observability And Benchmark Hygiene

Toto 2.0 trains on observability and synthetic time-series data while excluding public forecasting datasets during pretraining. That makes it a useful synthetic-data reference even though the source is an announcement article: it explicitly connects synthetic data to benchmark-leakage control, scaling, and observability-domain coverage.

Reasoning And Alignment Data

ChatTS uses synthetic time-series attributes and Q&A generation for time-series/LLM alignment. TimeOmni-1 combines curated reasoning samples with TSR-Suite-style reasoning tasks, and TimeOmni-VL builds time-series understanding and generation data around TS-image representations and CoT-conditioned generation. In this use, the bottleneck is not only numeric realism; it is whether annotations and prompts represent the reasoning behavior the model should learn.

Natural language guidance of high-fidelity TTS is an audio source rather than a time-series forecasting source, but it is a useful synthetic-annotation pattern: derive structured labels from real temporal data, convert them into natural-language descriptions, and use a small high-fidelity slice to steer generation quality.

T2S is the time-series counterpart to that pattern. It segments real time series into fragments, generates natural-language captions for local morphology, filters candidate captions with embedding similarity, and trains a text-to-series diffusion model on the resulting TSFragment-600K dataset. This is not synthetic data in the “simulate from a prior” sense; it is a synthetic annotation and conditional-generation loop over real temporal fragments.

TimeCraft-Style Generation Programs

TimeCraft makes synthetic time-series generation a program rather than one architecture. TimeDP learns reusable time-series prototypes and turns a few target-domain examples into domain prompts. BRIDGE adds text-controlled generation through LLM-generated and refined descriptions. TarDiff steers EHR generation with influence estimates so generated samples are useful for downstream clinical models, while OATS makes synthetic augmentation part of the TSFM pretraining loop. Diff-MN targets irregular-to-continuous generation through diffusion-parameterized MoE-NCDE dynamics, and CaTSG moves the generation objective toward observational, interventional, and counterfactual time series. DiGA and MarS add the financial-market simulation branch, where generated order flow or market trajectories are judged by control-target matching, stylized facts, downstream trading-agent utility, and simulator assumptions rather than generic sample realism alone.

This family sharpens the taxonomy: generation can target fidelity, text controllability, downstream utility, causal validity, online pretraining benefit, continuous-time reconstruction, or financial simulation. Those targets should not be collapsed into one synthetic-data score.

Evidence

The repeated use of synthetic data is a response to different bottlenecks. Data-volume scaling, label generation, iterative label bootstrapping, causal/template coverage, diagnostic latent-state labels, covariate behavior, PFN-style inference priors, language alignment, and reasoning supervision all need different audits.

Risks And Caveats

Synthetic templates can create unrealistic coupling, overly clean seasonality, or artifacts that a model memorizes as shortcuts.
Model-generated annotation loops can amplify seed-model mistakes, especially for rare regimes or minority event classes.
Text-to-series generators can learn caption artifacts or generic morphology words rather than operationally meaningful regimes, events, or interventions.
Target-aware generators such as TarDiff need guidance-set isolation, downstream-model diversity checks, and metric-overfitting audits; improving one clinical prediction metric is not automatically broad synthetic-data quality.
Online augmentation methods such as OATS should report reference-set provenance, strict train/validation/test isolation, and whether guidance samples come from benchmark-like distributions.
Adjacent generator-debiasing sources such as InvDiff are useful for shortcut and unknown-bias audits, but should not be counted as time-series synthetic-data evidence unless the temporal experiment and metric are named.
Generated samples can look high quality before a model later enters a memorization regime; synthetic time-series generators should therefore audit duplicate, nearest-neighbor or subsequence, membership-inference, and downstream-leakage risk by checkpoint age, not only by sample fidelity.
Adjacent language-model evidence from Synthetic Data for any Differentiable Target shows that generated examples can be optimized for hidden downstream training effects, not only surface realism or task labels. For time-series synthetic data, this suggests auditing representation drift, rare-state retention, and metric overfitting after training, not only simulator realism, numeric artifacts, or caption quality.
Pretraining corpora can leak public benchmark train or test structure even when the paper labels an evaluation as zero-shot.
Covariates in synthetic forecasting data are usually exogenous variables or known future features; they should not be described as actions, control inputs, or interventions unless the generator and evaluation actually encode controllable decisions.
Synthetic causal structure is not enough by itself for counterfactual validity on real systems with confounding, delayed effects, missingness, or policy-driven interventions.
Diagnostic-generator success, such as Aionoscope’s controlled latent-state probes, is not automatically real-task transfer. It should be treated as a unit test that names which process variables a representation exposes under a specified readout.

Relation To Foundation TSFM Agenda

Synthetic data maps to several slots in the Foundation Time-Series Model Research Agenda, but mostly as support rather than direct closure. It can improve data diversity, rare-regime coverage, context/generation alignment, and causal-template coverage. It becomes a warning when synthetic artifacts, decorative captions, benchmark leakage, or exogenous covariates are mistaken for real state, context, or controllable interventions.

Open Questions

Which synthetic-generation assumptions survive transfer to real-world temporal domains?
How should synthetic data be audited for causal and numerical artifacts?
When does success on synthetic latent-state accessibility diagnostics predict real diagnostic or control utility?
When do synthetic covariates remain exogenous variables, and when should they be modeled as actions, control inputs, or interventions?
How should benchmark reports separate synthetic-only pretraining, mixed real/synthetic pretraining, fine-tuning, and ensemble entries?
Which generator families best transfer to high-cardinality observability metrics and event streams?
Which temporal labels should be bootstrapped from real data instead of generated by a simulator?
When should text-to-series generation be evaluated by downstream utility rather than only reconstruction, retrieval, or caption-alignment metrics?
How should synthetic time-series data be audited for hidden downstream training effects, representation drift, or metric overfitting, not only surface realism and numeric artifacts?
Which TimeCraft-style control signal is most useful for downstream TSFMs: prototype prompts, text prompts, influence-guided generation, causal interventions, or online sample selection?

Alex Open Research Wiki

Explorer

Synthetic Data For Time Series

Synthetic Data For Time Series

Summary

What The Wiki Currently Believes

Data-Volume Scaling

Label And Classification Coverage

Iterative Label Bootstrapping

Causal And Template Generation

Covariate And Grouped Forecasting Behavior

PFN-Style Learned Inference Priors

Observability And Benchmark Hygiene

Reasoning And Alignment Data

TimeCraft-Style Generation Programs

Evidence

Risks And Caveats

Relation To Foundation TSFM Agenda

Open Questions

Graph View

Table of Contents

Backlinks

Alex Open Research Wiki

Explorer

Synthetic Data For Time Series

Synthetic Data For Time Series

Summary

What The Wiki Currently Believes

Data-Volume Scaling

Label And Classification Coverage

Iterative Label Bootstrapping

Causal And Template Generation

Covariate And Grouped Forecasting Behavior

PFN-Style Learned Inference Priors

Observability And Benchmark Hygiene

Reasoning And Alignment Data

TimeCraft-Style Generation Programs

Evidence

Risks And Caveats

Relation To Foundation TSFM Agenda

Open Questions

Related Pages

Graph View

Table of Contents

Backlinks