ChatTS: Aligning Time Series With LLMs Via Synthetic Data For Enhanced Understanding And Reasoning

Source

Core Claim

ChatTS treats time series as a modality for multimodal LLMs and uses synthetic time-series/text data to train understanding and reasoning over multivariate series.

Key Contributions

  • Proposes attribute-based synthetic time-series generation with detailed textual descriptions.
  • Introduces Time Series Evol-Instruct for diverse time-series Q&A data.
  • Builds a context-aware time-series encoder for variable-length multivariate inputs.
  • Reports strong gains over vision-based, text-based, and agent-based baselines on alignment and reasoning tasks.

Method Notes

ChatTS is linked to Time-Series Foundation Models and Synthetic Data For Time Series. It is also a precursor to the reasoning-oriented TimeOmni-1 and generation-oriented TimeOmni-VL.

Evidence And Results

The abstract reports a 46.0% improvement in alignment tasks and a 25.8% improvement in reasoning tasks over listed baselines, using real-world benchmark evaluation after synthetic training.

Limitations

The approach depends heavily on synthetic attribute coverage and evaluation design. The paper does not by itself prove general-purpose time-series reasoning beyond its task suite.

Foundation TSFM Relevance

Agenda slotVerdictEvidenceMissing pieces
Context interfacepartially closesToken-level concatenation preserves the position of multivariate series inside the surrounding text query and context.Context is mostly textual/query context, not system topology, events, or actions.
Representation quality: semantic state vs dense numeric detailpartially closesValue-preserved normalization adds scaling and offset information so the LLM can answer numerical queries after normalization.Does not prove dense reconstruction, editing, or calibrated future generation.
Benchmarks: what level of modeling is tested?partially closesEvaluates alignment and reasoning tasks including trend, seasonality, local fluctuation, correlation, clustering, inductive, deductive, and causal QA.Evaluation is synthetic-heavy and excludes domain-specific classification and etiological reasoning.

Open Questions

  • Which attributes are sufficient for robust time-series-language grounding?
  • How does ChatTS compare against reasoning-specific RL/post-training approaches?