Streaming Latent-State Updates

Summary

Streaming latent-state updates are the serving contract for models that operate on never-ending streams. The model receives new observations or events, updates a retained state, decides whether to emit an output or stay silent, and keeps enough information for future decisions without replaying the full history.

For this wiki, the target is a foundation time-series model that can update at least as fast as wall-clock data arrives while preserving rare regimes, context, event streams, and action history. A real-time model is not proven by a long context window alone; it needs explicit update cost, latency, memory, state-refresh, and abstention or trigger behavior.

Streaming Contract

A minimal contract should name the retained state, the update rule, the output decision, and the serving budget:

s_{t} = U_{θ} (s_{t - 1}, o_{t}, c_{t}, e_{t}, a_{t - 1})

y_{t}, d_{t} = R_{θ} (s_{t}, q_{t})

Here $o_{t}$ is the current observation, $c_{t}$ is context, $e_{t}$ is an event-stream item, $a_{t - 1}$ is prior action or control-input history when it exists, $s_{t}$ is latent state, $d_{t}$ is a decision such as abstain, alert, respond, ask, forecast, or propose action, and $y_{t}$ is the emitted output.

flowchart LR
  O[New observation or event] --> U[State update]
  C[Context and schema] --> U
  A[Action or control-input history] --> U
  U --> S[Retained latent state]
  S --> D{Decision}
  D -->|silent or abstain| O
  D -->|answer or forecast| Y[Output]
  D -->|candidate intervention| P[Planner or policy]
  Y --> O
  P --> A

What The Wiki Currently Believes

Moshi is the stronger audio analogue for streaming generation and artifact diagnostics in this corpus. Audio Interaction Model is now demoted to a context-level warning: it makes silence/response serving explicit, but depends on heavy curated-data construction and does not solve retained-state growth.

Moshi is the earlier full-duplex example: 80 ms Mimi codec frames, 160 ms theoretical latency, about 200 ms practical latency, two separate audio streams for user and system, and an Inner Monologue text stream for Moshi’s own speech. Its lesson for metrics work is that streaming generation needs temporal artifact diagnostics, not only aggregate quality scores: the paper uses token-entropy windows to flag repetitive text, background noise during intended silence, gibberish, and noisy audio.

Audio Interaction Model is still a useful explicit trigger-token example: 400 ms chunks, silence/response control tokens, FIFO asynchronous inference, first-chunk latency, stall rate, and proactive trigger evaluation. Its lesson is narrow: real-time benchmarks should score the decision not to emit an output, but its TFJP preprocessing, synthetic silence supervision, history-review prompts, and FIFO queue are mostly workarounds rather than a reusable streaming-state architecture.

Language Models Need Sleep is the strongest current analogy for finite-window eviction. The model spends extra consolidation compute before old context leaves the attention window, then resumes cheaper wake-time prediction. The time-series analogue is a learned state-refresh step over recent numeric observations, event streams, context, and action history before raw samples are dropped.

FADE adds the continual-learning warning: online systems need selective forgetting, not a single fixed retention horizon. The time-series version should forget stale mappings while retaining stable dynamics and rare safety-relevant state.

TurboQuant adds the serving-memory warning: compressed retained state must improve actual latency, throughput, memory pressure, and quality after dequantization or retrieval cost is counted.

Latent Context Language Models add adjacent evidence for prefill-time learned context compression: a smaller encoder compresses static prompt spans into soft latent tokens before decoder prefill, with optional exact expansion for selected chunks. For streaming TSFMs, this sharpens whether compression can be updated online as observations and tool results arrive, or whether it belongs at eviction and consolidation boundaries.

Mamba-3, Gated DeltaNet, Gated DeltaNet-2, RWKV-TS, and RATE are architecture background for compact recurrent state, active memory editing, numeric time-series recurrence, and action-trajectory memory. They are not sufficient by themselves because the streaming-state target also needs context schemas, event streams, interventions, state-refresh probes, and serving measurements.

Gated DeltaNet-2 adds a useful selective-update warning for this page: retaining a fixed-size state is not enough if the update has one scalar knob for forgetting and committing. A streaming TSFM may need to erase stale channel relationships while refusing to overwrite rare but decision-relevant state, so erase/write separation should be tested as a preservation mechanism rather than assumed from language retrieval gains.

Oryx, Hybrid Associative Memories, and HOLA add the streaming allocation variant. Oryx switches between exact attention and recurrent state across spans. HAM routes hard-to-predict tokens into an explicit KV scratchpad. HOLA keeps a fixed exact cache of the tokens with the largest committed GDN update magnitude. For streaming time series, these mechanisms translate into a testable policy question: which observations, events, action windows, or topology changes must remain exactly readable, and which can be compressed into retained state? HOLA adds a regime-shift caveat because old global top- $w$ events can occupy the cache unless age, value, or refresh semantics are introduced.

Comparing Transformers and Hybrid Models at the Token Level adds a filtered-evaluation warning for streaming state. The text-side split says recurrent state helps more on state-conditioned, meaning-bearing predictions, while attention helps on exact repetition and structural closure. A streaming TSFM analogue should separately score post-eviction latent-state readout, rare regime updates, exact recent-value recall, repeated normal spans, and known structural constraints instead of averaging them into one forecast loss.

AdaJEPA is adjacent but useful for the action-conditioned streaming question. It updates a latent world model after each executed action and then replans, showing a small closed-loop version of “new observation → state/model update → next decision.” It should not be overread as a full always-on TSFM solution because the updates are episode-local, short-horizon, and evaluated on visual-control tasks rather than unbounded multivariate streams.

Design Requirements

State update cost MUST be reported per sample, per event, or per chunk.
Retained state size MUST be explicit: KV cache, recurrent hidden state, memory tokens, fast weights, external memory, or compressed summaries.
The model SHOULD expose an abstain/silence/trigger decision when real-time output is optional or costly.
Benchmarks SHOULD include benign no-op spans and rare critical spans, so false positives and false negatives are both measured.
Context eviction SHOULD be audited: what is lost when raw history leaves the retained window?
Compression SHOULD be evaluated against downstream state utility, not only byte reduction or reconstruction error.
Action or control-input history MUST be distinguished from passive events and exogenous variables.

Relation To Foundation TSFM Agenda

This page fills the streaming-state gap in the Foundation Time-Series Model Research Agenda. It maps most directly to streaming state and long context, but also touches event streams, context interface, dynamic compute, and control/counterfactuals when action history is present.

The agenda-relevant test is:

Can the model keep useful state under continuous updates,
with bounded latency and memory, while preserving the variables
needed for future observations, alerts, and candidate interventions?

Audio-Interaction does not answer this test. It gives a serving-contract and trigger-evaluation analogy outside numeric time series, but it does not provide bounded memory, eviction audits, learned state compression, multivariate observations, irregular event streams, graph time series, topology, exogenous variables, or typed actions and control inputs.

Open Questions

What is the minimal benchmark for always-on numeric time-series state updates?
Should a streaming TSFM emit an explicit abstain/alert/action token, or should that decision live in a separate policy head?
Should no-op behavior be represented as explicit abstain/alert/action tokens, while avoiding Audio-Interaction-style reliance on hand-curated silence supervision?
How should state-refresh quality be measured after a retained window is evicted?
Can FIFO-style ingestion/decoding decoupling transfer from audio to telemetry serving without being mistaken for context compression?
Which retained-state interface is best under real serving constraints: KV cache, recurrent state, memory tokens, fast weights, learned summaries, or compressed retrieval memory?
Can decoupled erase/write recurrent updates reduce stale-association overwrites in continuous multivariate streams without erasing rare regimes?
Can HAM-style routing, HOLA-style update-magnitude retention, or Oryx-style mixer routing decide which streaming spans deserve exact memory without missing predictable but decision-critical action windows?
Which filtered validation slices best separate state-conditioned streaming updates from exact recent-value recall and repeated normal behavior?
How should false-positive alerting cost and false-negative safety cost be weighted for rare operational events?
Can Moshi-style entropy-over-token-stream diagnostics become useful health checks for generated forecasts, alert streams, or latent-state summaries under quantization and long-running service?
Can an AdaJEPA-style plan—observe—adapt loop become an always-on retained-state update without unsafe parameter drift or episode-reset assumptions?

Alex Open Research Wiki

Explorer

Streaming Latent-State Updates

Streaming Latent-State Updates

Summary

Streaming Contract

What The Wiki Currently Believes

Design Requirements

Relation To Foundation TSFM Agenda

Open Questions

Graph View

Table of Contents

Backlinks

Alex Open Research Wiki

Explorer

Streaming Latent-State Updates

Streaming Latent-State Updates

Summary

Streaming Contract

What The Wiki Currently Believes

Design Requirements

Relation To Foundation TSFM Agenda

Open Questions

Related Pages

Graph View

Table of Contents

Backlinks