GIFT-Eval: General Time Series Forecasting Model Evaluation

Source

Core Claim

GIFT-Eval is a broad general-purpose forecasting benchmark for comparing time-series foundation models across domains, frequencies, variate counts, and prediction lengths.

Dataset Notes

  • The Hugging Face card describes 144,000 time series, roughly 177 million data points, and 97 forecasting configurations.
  • The suite includes a non-leaking pretraining dataset intended to support zero-shot evaluation without test leakage.
  • Public summaries and papers use slightly different dataset counts, so exact counts should be tied to a specific artifact version.

Why It Matters

GIFT-Eval is a central benchmark for Toto, Toto 2.0, and many other time-series foundation-model sources in this repository. It is useful for benchmark hygiene because it separates train/test data and exposes public leaderboard protocols.

Limitations

  • It is not an observability-specific benchmark.
  • It usually does not stress the hundreds-to-thousands channel regime the way BOOM or Time-HD do.
  • Component dataset licenses and terms should be checked for downstream use.

Foundation TSFM Relevance

Agenda slotVerdictEvidenceMissing pieces
Forecasting benchmark coveragepartially closesThe raw dataset card covers broad domains, frequencies, variate counts, prediction lengths, and a non-leaking pretraining/evaluation split.Passive forecasting only; not an observability, control, causal, or generation benchmark.
Benchmark hygienepartially closesPublic leaderboard protocols and separated pretraining data make it a useful zero-shot comparison surface for TSFMs.Counts and configs vary across artifacts; downstream users still need version-pinned reporting and license checks.
Native high-channel modelinginsufficient evidenceThe dataset suite includes varying variate counts but is not designed around hundreds-to-thousands-channel telemetry.Needs explicit high-channel and topology-aware evaluation slices.