CauKer: Classification Time Series Foundation Models Can Be Pretrained On Synthetic Data Only

Source

Raw Markdown: paper_cauker-2025.md
PDF: paper_cauker-2025.pdf

Core Claim

CauKer argues that classification time-series foundation models can be pretrained sample-efficiently on synthetic data generated from Gaussian-process kernel composition and structural causal models.

Key Contributions

Generates diverse, causally coherent synthetic time series with trend, seasonality, and nonlinear interaction structure.
Targets pretraining for classification TSFMs rather than forecasting alone.
Reports scaling laws over synthetic dataset size and model capacity.

Method Notes

CauKer is a bridge between Synthetic Data For Time Series, Causal Time Series, and Time-Series Foundation Models.

Evidence And Results

The abstract reports synthetic dataset scaling from 10K to 10M samples and model scaling from 1M to 783M parameters, with clearer scaling behavior than real-world datasets.

Limitations

The source focuses on classification TSFMs. It does not settle whether synthetic causal generation transfers to reasoning, forecasting, or high-fidelity generation tasks.

Foundation TSFM Relevance

Agenda slot	Verdict	Evidence	Missing pieces
Data diversity, curriculum, and long tail	partially closes	Generates arbitrarily large classification pretraining data with GP kernels plus SCM structure and reports data/model scaling laws.	Synthetic factors are hand-designed and classification-only; rare real regimes and interventions are not validated.
Augmentation-free or dataset-aware self-supervision	partially closes	Pretrains Mantis and MOMENT-style encoders on synthetic causal-kernel series rather than relying on a real classification corpus.	Does not cover forecasting, generation, editing, or action-conditioned objectives.
Causal and counterfactual structure	adjacent	Uses DAG propagation and nonlinear edge functions to create causally coherent synthetic series.	No logged actions, interventions, or counterfactual rollout benchmark.

Links Into The Wiki

Open Questions

Which synthetic causal assumptions produce robust out-of-distribution transfer?
Can CauKer-style data help reasoning models such as TimeOmni-1?

Alex Open Research Wiki

Explorer

CauKer: Classification Time Series Foundation Models Can Be Pretrained On Synthetic Data Only

CauKer: Classification Time Series Foundation Models Can Be Pretrained On Synthetic Data Only

Source

Core Claim

Key Contributions

Method Notes

Evidence And Results

Limitations

Foundation TSFM Relevance

Links Into The Wiki

Open Questions

Graph View

Table of Contents

Backlinks