Context-Aided Forecasting

Summary

Context-aided forecasting predicts future observations from both time-series history and relevant context. In this wiki’s terminology, the common case is a text-conditioned time series: the numeric history is still central, but the context carries information that the numeric window cannot reveal by itself.

Context is Key is the landmark source for this topic. Its benchmark makes the failure mode concrete: a model can fit the visible numeric pattern and still be wrong because the decisive information is in the text.

What’s Wrong With The Current Time-Series Deep Learning? makes the latent-state version explicit: context helps the model know the physical meaning of the data and the environment state in which the observations occur.

What The Wiki Currently Believes

Context Is Part Of The Forecasting Interface

The right abstraction is not just P(future | history). For context-aided forecasting, the interface is closer to P(future | history, context), where context can name the process, provide constraints, summarize hidden history, describe expected events, or specify causal relationships.

This matters for time-series foundation models because many strong forecasters still assume the numeric history is the whole problem. CiK shows why that assumption breaks when the historical window is short, misleading, or missing domain knowledge that a human forecaster would naturally use.

Context Types Must Stay Distinct

CiK’s five context sources are useful wiki categories:

Intemporal information: stable facts about the process, units, value constraints, or long-period seasonality.
Future information: known or hypothesized future events and constraints.
Historical information: facts about earlier behavior that are not visible in the provided numeric history.
Covariate information: additional variables statistically associated with the target.
Causal information: causal relationships between covariates, events, or interventions and the target.

These categories map back to the terminology page. Future information is often an event or exogenous variable, not automatically an action. Causal information may discuss interventions, but only an explicit controllable channel makes the task action-conditioned.

Evaluation Needs Context-Sensitive Metrics

Ordinary aggregate forecasting metrics can underweight the exact windows where context matters. CiK’s RCRPS is important because it upweights regions of interest and penalizes constraint violations. For future benchmarks, context should be evaluated with ablations that remove or corrupt the context, plus metrics that isolate the context-sensitive part of the forecast.

LLMs Are Strong But Not Yet The End State

Prompted LLMs are the first obvious baseline because they can read text and emit structured forecasts. CiK shows that this can work, especially with large instruction-tuned models and constrained output formats. The gotcha is cost and brittleness: a few context misinterpretations can dominate aggregate error, and many LLM approaches are too slow for high-volume forecasting.

The research target is therefore not merely “use an LLM.” It is an efficient context-conditioned forecaster that can preserve numerical calibration, understand text, respect constraints, and expose uncertainty.

Position: What Can LLMs Tell Us about Time Series Analysis is the broad roadmap source for this direction. UniTime is narrower and more concrete: it uses domain instructions as a prefix before time-series tokens so a GPT-style causal backbone can condition numeric forecasting on text.

Dataset Design Is The Hard Part

Text attached to a time series is not enough. The text must change the correct forecast distribution in a verifiable way. CiK does this through manual task construction and validation, which makes it a high-quality benchmark but not a scalable dataset engine by itself.

For time-series research, the next dataset question is how to bootstrap large context-aided corpora where the context is actually necessary, not decorative metadata or a loosely related caption.

T2S is adjacent rather than identical to context-aided forecasting. It uses text as the primary conditioning input for time-series generation, not as extra context for forecasting from observed history. Its TSFragment-600K pipeline is still important because it shows a scalable way to attach fragment-level natural-language descriptions to local time-series morphology. The caveat is that generated captions can become decorative unless evaluation proves that the text changes the generated or forecasted distribution in the intended way.

T2S’s no-text ablation shows captions matter for its generation metrics, but it still does not prove that generated fragment captions are operational context for forecasting or intervention choice.

BRIDGE adds a stronger synthetic text-control data-engine example. Its multi-agent pipeline creates and refines natural-language descriptions, then a diffusion model uses those descriptions plus semantic prototypes to generate time series. Like T2S, this is generation rather than forecasting from observed history; its relevance here is the data-engine pattern and the warning that generated descriptions can become a benchmark artifact.

TimeRAF adds a non-text context path: retrieved time-series examples are external context for zero-shot forecasting. This is context-aided forecasting in the broad sense, but it needs knowledge-base overlap audits before comparing it with base zero-shot forecasters.

Evidence

CiK reports 71 manually designed tasks across seven domains and shows that strong LLM-based forecasters improve substantially when given context. It also reports that no method is best across all context types, meaning the benchmark remains unsolved. The failure analysis is as important as the leaderboard: context-capable models sometimes make catastrophic mistakes when they misread or mishandle the text.

Adjacent sources point to pieces of the same problem. CHARM uses channel descriptions for multivariate representation learning, ChatTS trains a time-series MLLM on synthetic time-series/text pairs, UniTime uses domain instructions for cross-domain forecasting, T2S and BRIDGE use text for controllable time-series generation, TimeRAF uses retrieved time series as context, TelecomTS pairs observability KPI windows with descriptions, troubleshooting tickets, labels, and Q&A, and TimeOmni-1 frames scenario understanding and event-aware forecasting as reasoning tasks. Natural language guidance of high-fidelity TTS is outside forecasting but gives a useful synthetic-annotation pattern for turning temporal signals into controllable language metadata. CiK is the clean benchmark for the narrower question: does textual context actually improve probabilistic forecasting?

Relation To Foundation TSFM Agenda

This page is the main local anchor for the context-interface slot in the Foundation Time-Series Model Research Agenda.

Agenda slot	Verdict	Evidence	Missing pieces
Context interface	partially closes	CiK proves that numeric history can be under-specified without essential text; CHARM adds channel-description conditioning.	Needs high-dimensional streaming context, topology, event streams, and action history.
Native multivariate encoding	adjacent	CHARM links channel semantics to multivariate representation learning.	The page is broader than native multivariate state and does not settle high-channel scaling.
Control and counterfactuals	insufficient evidence	Context can describe causal relationships or future events, but the page does not provide controllable action rollout evidence.	Needs explicit action/control channels and intervention comparisons.

Open Questions

How should context-aided forecasting scale from univariate textual context to multivariate time series with multiple context modalities?
Can a small or medium time-series model use retrieved or compressed context as reliably as a large prompted LLM?
What automatic dataset-generation loop can guarantee that context is essential rather than merely correlated with the answer?
Which context fields should become explicit exogenous variables, events, control inputs, or interventions in an action-conditioned world-model interface?
Can fragment-level captions be made operational enough to guide forecasting or intervention choice, rather than only generating plausible morphology?
When does retrieval-augmented forecasting improve true zero-shot generalization rather than importing benchmark-near neighbors from a knowledge base?

Alex Open Research Wiki

Explorer

Context-Aided Forecasting

Context-Aided Forecasting

Summary

What The Wiki Currently Believes

Context Is Part Of The Forecasting Interface

Context Types Must Stay Distinct

Evaluation Needs Context-Sensitive Metrics

LLMs Are Strong But Not Yet The End State

Dataset Design Is The Hard Part

Evidence

Relation To Foundation TSFM Agenda

Open Questions

Graph View

Table of Contents

Backlinks

Alex Open Research Wiki

Explorer

Context-Aided Forecasting

Context-Aided Forecasting

Summary

What The Wiki Currently Believes

Context Is Part Of The Forecasting Interface

Context Types Must Stay Distinct

Evaluation Needs Context-Sensitive Metrics

LLMs Are Strong But Not Yet The End State

Dataset Design Is The Hard Part

Evidence

Relation To Foundation TSFM Agenda

Open Questions

Related Pages

Graph View

Table of Contents

Backlinks