TIME Benchmark

Source

Dataset metadata snapshot: source.md
Metadata JSON: metadata.json
Official Hugging Face: https://huggingface.co/datasets/Real-TSF/TIME
Official leaderboard: https://huggingface.co/spaces/Real-TSF/TIME-leaderboard
Paper: https://arxiv.org/abs/2602.12147

Core Claim

TIME is a task-centric zero-shot forecasting benchmark built from fresh datasets to reduce benchmark contamination. Toto 2.0 uses it as part of its scaling-era benchmark evidence.

Dataset Notes

The Hugging Face card describes 50 fresh datasets and 98 forecasting tasks.
The artifact includes task-level data and window-level prediction results for benchmark visualization.
The benchmark is positioned around strict zero-shot and contamination-resistance claims.

Why It Matters

TIME belongs in Time-Series Benchmark Hygiene because it directly addresses the concern that public forecasting benchmarks can become contaminated by pretraining corpora, benchmark-specific tuning, or repeated leaderboard exposure.

Limitations

TIME is not primarily an observability benchmark.
It is not primarily an HDTSF benchmark.
It remains passive forecasting data, not an action-conditioned world-model dataset.

Foundation TSFM Relevance

Agenda slot	Verdict	Evidence	Missing pieces
Benchmarks: what level of modeling is tested?	partially closes	Provides 50 fresh datasets and 98 zero-shot forecasting tasks to reduce benchmark contamination.	Still tests passive forecasting rather than state prediction, control, generation fidelity, or explanation.
Data diversity, curriculum, and long tail	adjacent	Fresh task-centric benchmark broadens evaluation beyond repeatedly used public forecasting datasets.	Metadata does not establish rare-regime, intervention, or long-tail coverage.

Alex Open Research Wiki

Explorer

TIME Benchmark

TIME Benchmark

Source

Core Claim

Dataset Notes

Why It Matters

Limitations

Foundation TSFM Relevance

Links Into The Wiki

Graph View

Table of Contents

Backlinks