TIME Benchmark
Source
- Dataset metadata snapshot: source.md
- Metadata JSON: metadata.json
- Official Hugging Face: https://huggingface.co/datasets/Real-TSF/TIME
- Official leaderboard: https://huggingface.co/spaces/Real-TSF/TIME-leaderboard
- Paper: https://arxiv.org/abs/2602.12147
Core Claim
TIME is a task-centric zero-shot forecasting benchmark built from fresh datasets to reduce benchmark contamination. Toto 2.0 uses it as part of its scaling-era benchmark evidence.
Dataset Notes
- The Hugging Face card describes 50 fresh datasets and 98 forecasting tasks.
- The artifact includes task-level data and window-level prediction results for benchmark visualization.
- The benchmark is positioned around strict zero-shot and contamination-resistance claims.
Why It Matters
TIME belongs in Time-Series Benchmark Hygiene because it directly addresses the concern that public forecasting benchmarks can become contaminated by pretraining corpora, benchmark-specific tuning, or repeated leaderboard exposure.
Limitations
- TIME is not primarily an observability benchmark.
- It is not primarily an HDTSF benchmark.
- It remains passive forecasting data, not an action-conditioned world-model dataset.
Foundation TSFM Relevance
| Agenda slot | Verdict | Evidence | Missing pieces |
|---|---|---|---|
| Benchmarks: what level of modeling is tested? | partially closes | Provides 50 fresh datasets and 98 zero-shot forecasting tasks to reduce benchmark contamination. | Still tests passive forecasting rather than state prediction, control, generation fidelity, or explanation. |
| Data diversity, curriculum, and long tail | adjacent | Fresh task-centric benchmark broadens evaluation beyond repeatedly used public forecasting datasets. | Metadata does not establish rare-regime, intervention, or long-tail coverage. |