TIME Benchmark

Source

Core Claim

TIME is a task-centric zero-shot forecasting benchmark built from fresh datasets to reduce benchmark contamination. Toto 2.0 uses it as part of its scaling-era benchmark evidence.

Dataset Notes

  • The Hugging Face card describes 50 fresh datasets and 98 forecasting tasks.
  • The artifact includes task-level data and window-level prediction results for benchmark visualization.
  • The benchmark is positioned around strict zero-shot and contamination-resistance claims.

Why It Matters

TIME belongs in Time-Series Benchmark Hygiene because it directly addresses the concern that public forecasting benchmarks can become contaminated by pretraining corpora, benchmark-specific tuning, or repeated leaderboard exposure.

Limitations

  • TIME is not primarily an observability benchmark.
  • It is not primarily an HDTSF benchmark.
  • It remains passive forecasting data, not an action-conditioned world-model dataset.

Foundation TSFM Relevance

Agenda slotVerdictEvidenceMissing pieces
Benchmarks: what level of modeling is tested?partially closesProvides 50 fresh datasets and 98 zero-shot forecasting tasks to reduce benchmark contamination.Still tests passive forecasting rather than state prediction, control, generation fidelity, or explanation.
Data diversity, curriculum, and long tailadjacentFresh task-centric benchmark broadens evaluation beyond repeatedly used public forecasting datasets.Metadata does not establish rare-regime, intervention, or long-tail coverage.