Awesome Agentic Time Series

Source

Status And Credibility

This is a June 2026 public GitHub repository and survey snapshot. The inspected main commit is 1dc5e3c366be82f930619ce0801c810fbcfe7060, dated 2026-06-13, and the included survey PDF metadata was created on 2026-06-12. The README is MIT-licensed and lists 239 dated paper entries across surveys, benchmarks, time-series foundation models, LLM4TS, agentic systems, and reliability.

The source is credible as a current field map because it is maintained by a broad author group affiliated with Tsinghua University, UIC, CUHK, UTS, CMU, Ohio State, USC, NUS, Dartmouth, Peking University, Shenzhen University, Tongji, Northwestern, and Griffith, and because the repository itself preserves the paper list and survey artifact. It is not peer-reviewed evidence for every listed method, benchmark, or claim. Treat it as a survey and bibliography source: useful for taxonomy, gap finding, and candidate discovery, but not a substitute for ingesting primary papers before making strong technical claims.

Core Claim

The repository and survey frame agentic time series as a shift from model-centric prediction toward closed-loop systems that observe temporal evidence, reason over evolving state, choose tools or actions, receive feedback, update memory, and eventually simulate future temporal environments.

For this wiki, the important distinction is interface-level: a time-series agent is not just an LLM wrapper around forecasts. It is a temporal decision system with observations, context, tools or actions, feedback, state updates, and reliability constraints.

Repository Scope

The README organizes the field into six high-level source groups:

GroupREADME entries in snapshotLocal interpretation
Surveys and position papers8Useful for terminology and field boundaries, but primary sources still need individual ingestion.
Benchmarks and datasets50Shows the shift from forecasting leaderboards toward reasoning, QA, engineering, decision, and future-prediction benchmarks.
Time-series foundation models31Mostly passive forecasting, representation, or universal-model sources; not automatically agentic.
LLM4TS64Translation, alignment, temporal reasoning, and LLM-mediated analysis sources.
Agentic time-series systems79Perception, reasoning, planning/action, memory, knowledge, world-model, and data-agent systems.
Reliability, safety, and trustworthiness7Early explicit reliability layer for forecasting agents and temporal decision systems.

The repository’s paper-list taxonomy is useful because it separates the field by system role rather than by one benchmark score. It also exposes a practical ingest queue: many relevant 2025-2026 papers are not yet local source pages.

Survey Notes

The included survey defines a time-series agent as a closed-loop system operating in a temporal environment. The central loop is:

temporal evidence + context + current state
  -> perception / reasoning / planning
  -> tool call, query, or action
  -> feedback
  -> updated state or memory

The survey’s five capability layers are:

  1. Time-series perception: turn raw numeric observations, diagnostic tool outputs, symbolic summaries, structure, or multimodal context into evidence.
  2. Time-series reasoning: infer patterns, causal hypotheses, anomalies, uncertainty, and future dynamics.
  3. Planning and action: route tools, acquire evidence, orchestrate model/data/code workflows, coordinate agents, or take external decisions.
  4. Memory and knowledge: store temporal cases, regimes, procedures, failures, confidence signals, and domain knowledge across sessions.
  5. Temporal world models: simulate plausible futures, interventions, and counterfactual alternatives.

Reliability and trustworthiness sit across the layers rather than at the end. The survey names forecasting quality, reasoning faithfulness, tool-use reliability, hallucination and grounding, robustness, decision safety, human alignment, auditability, and reproducibility as system-level checks.

Foundation TSFM Relevance

Agenda slotVerdictEvidenceMissing pieces
Benchmarks and evaluation protocoladjacentThe source maps forecasting, reasoning, QA, engineering, tool-use, decision, and future-prediction benchmarks into an agentic evaluation landscape.Needs primary benchmark ingests and normalized protocols before the wiki can compare results.
Context interfaceadjacentThe survey treats temporal agents as systems that combine numeric observations with textual, structural, tool, memory, and environmental context.Does not define a concrete reusable schema for channels, topology, exogenous variables, action history, or deployment context.
Control and counterfactualsadjacentThe planning/action and temporal-world-model layers explicitly discuss tools, actions, feedback, interventions, and counterfactual simulation.Survey-level taxonomy only; no telemetry-native action-conditioned benchmark with typed operator actions and outcomes.
Streaming state and memoryadjacentMemory and knowledge are treated as persistent temporal experience rather than a passive chat transcript.No standardized benchmark for state-update cost, memory auditability, stale-memory failures, or long-horizon regime retention.
Reliability and benchmark hygienewarningThe source warns that deployed agentic systems fail through interactions among perception, reasoning, tools, memory, actions, and feedback rather than through final forecast error alone.Needs primary-source evidence and reproducible evaluation bundles for each failure mode.

Candidate Follow-Up Ingests

The source is mainly valuable as a candidate queue. The strongest first follow-up candidates are the entries that most directly touch Alex’s agenda:

  • Position: Beyond Model-Centric Prediction - Agentic Time Series Forecasting for the explicit agentic-forecasting position.
  • TemporalBench and TimeSage-MT for agentic/time-series reasoning benchmarks.
  • TFRBench, ARFBench, and TimeSeriesGym for reasoning, incident-response, and engineering-agent evaluation.
  • KairosAgent, TimeART, Cast-R1, Nexus, and MoiraiAgent for tool-augmented or planning-oriented time-series agents.
  • MemCast, TS-Memory, and MEMTS for memory interfaces.
  • Chronicle, AgriWorld, and Sonar-TS for world-model, tool, or query interfaces over temporal environments.

These should not be cited as local evidence until each primary paper or artifact is checked and ingested.

Open Questions

  • Which listed sources are primary evidence for action-conditioned temporal world models rather than LLM-mediated analysis pipelines?
  • Which benchmark entries actually test closed-loop decisions, feedback, and action consequences instead of static QA or forecasting?
  • What minimum reliability protocol should apply before calling a time-series agent deployable: calibration, grounding, tool-use logs, memory audit, action safety, replay, and cost?
  • Which memory papers distinguish durable temporal state from context-window summarization?
  • Can the survey’s five-layer architecture be translated into a concrete data contract for observability or industrial control: observations, context, event streams, actions, outcomes, and safety constraints?