Audio-Interaction

Summary

Audio-Interaction is a 2026 streaming audio-language model that listens to audio chunks in real time and decides whether to stay silent or generate a text response.

Role In The Wiki

Audio-Interaction is a caveated real-time interaction example rather than a numeric time-series foundation model. Its wiki value is mostly diagnostic: it exposes an always-on serving contract with chunked observations, silence/response decisions, FIFO scheduling, and measured latency/stall behavior, but it relies heavily on curated audio construction and does not solve bounded long-history state.

Official Artifacts

Official project page: Audio Interaction Model
Official code: xzf-thu/Audio-Interaction
Official Hugging Face: zhifeixie/AudioInteraction
Official dataset: zhifeixie/StreamAudio-2M
Preprint: arXiv 2606.05121

Evidence

Audio Interaction Model

Relation To Foundation TSFM Agenda

Use the source-level agenda mapping in audio-interaction-model-2026 rather than duplicating verdict rows here.

At the entity level, Audio-Interaction is a context-level audio event-stream analogue for the streaming-state slot. It does not yet provide numeric observations, multivariate telemetry, graph time series, typed control inputs, intervention-outcome rollouts, or a general retained-state compression/eviction mechanism.

Alex Open Research Wiki

Explorer

Audio-Interaction

Audio-Interaction

Summary

Role In The Wiki

Official Artifacts

Evidence

Relation To Foundation TSFM Agenda

Graph View

Table of Contents

Backlinks

Alex Open Research Wiki

Explorer

Audio-Interaction

Audio-Interaction

Summary

Role In The Wiki

Official Artifacts

Evidence

Relation To Foundation TSFM Agenda

Related Pages

Graph View

Table of Contents

Backlinks