GIFT-Eval: General Time Series Forecasting Model Evaluation

Source

Dataset metadata snapshot: source.md
Metadata JSON: metadata.json
Official Hugging Face: https://huggingface.co/datasets/Salesforce/GiftEval
Official leaderboard: https://huggingface.co/spaces/Salesforce/GIFT-Eval
Official code: https://github.com/SalesforceAIResearch/gift-eval
Paper: https://arxiv.org/abs/2410.10393

Core Claim

GIFT-Eval is a broad general-purpose forecasting benchmark for comparing time-series foundation models across domains, frequencies, variate counts, and prediction lengths.

Dataset Notes

The Hugging Face card describes 144,000 time series, roughly 177 million data points, and 97 forecasting configurations.
The suite includes a non-leaking pretraining dataset intended to support zero-shot evaluation without test leakage.
Public summaries and papers use slightly different dataset counts, so exact counts should be tied to a specific artifact version.

Why It Matters

GIFT-Eval is a central benchmark for Toto, Toto 2.0, and many other time-series foundation-model sources in this repository. It is useful for benchmark hygiene because it separates train/test data and exposes public leaderboard protocols.

Limitations

It is not an observability-specific benchmark.
It usually does not stress the hundreds-to-thousands channel regime the way BOOM or Time-HD do.
Component dataset licenses and terms should be checked for downstream use.

Foundation TSFM Relevance

Agenda slot	Verdict	Evidence	Missing pieces
Forecasting benchmark coverage	partially closes	The raw dataset card covers broad domains, frequencies, variate counts, prediction lengths, and a non-leaking pretraining/evaluation split.	Passive forecasting only; not an observability, control, causal, or generation benchmark.
Benchmark hygiene	partially closes	Public leaderboard protocols and separated pretraining data make it a useful zero-shot comparison surface for TSFMs.	Counts and configs vary across artifacts; downstream users still need version-pinned reporting and license checks.
Native high-channel modeling	insufficient evidence	The dataset suite includes varying variate counts but is not designed around hundreds-to-thousands-channel telemetry.	Needs explicit high-channel and topology-aware evaluation slices.

Alex Open Research Wiki

Explorer

GIFT-Eval: General Time Series Forecasting Model Evaluation

GIFT-Eval: General Time Series Forecasting Model Evaluation

Source

Core Claim

Dataset Notes

Why It Matters

Limitations

Foundation TSFM Relevance

Links Into The Wiki

Graph View

Table of Contents

Backlinks