Audio-Interaction
Summary
Audio-Interaction is a 2026 streaming audio-language model that listens to audio chunks in real time and decides whether to stay silent or generate a text response.
Role In The Wiki
Audio-Interaction is a caveated real-time interaction example rather than a numeric time-series foundation model. Its wiki value is mostly diagnostic: it exposes an always-on serving contract with chunked observations, silence/response decisions, FIFO scheduling, and measured latency/stall behavior, but it relies heavily on curated audio construction and does not solve bounded long-history state.
Official Artifacts
- Official project page: Audio Interaction Model
- Official code: xzf-thu/Audio-Interaction
- Official Hugging Face: zhifeixie/AudioInteraction
- Official dataset: zhifeixie/StreamAudio-2M
- Preprint: arXiv 2606.05121
Evidence
Relation To Foundation TSFM Agenda
Use the source-level agenda mapping in audio-interaction-model-2026 rather than duplicating verdict rows here.
At the entity level, Audio-Interaction is a context-level audio event-stream analogue for the streaming-state slot. It does not yet provide numeric observations, multivariate telemetry, graph time series, typed control inputs, intervention-outcome rollouts, or a general retained-state compression/eviction mechanism.