Aionoscope: Debugging Latent-State Accessibility in Time-Series Representations

Source

Raw Markdown: paper_aionoscope-2026.md
PDF: paper_aionoscope-2026.pdf
Preprint: https://arxiv.org/abs/2607.00956
DOI: https://doi.org/10.48550/arXiv.2607.00956
Interactive results: https://aionoscope.langotime.ai/
Official generator/library code: https://github.com/langotime/aionoscope/
Official benchmark code: https://github.com/langotime/aionoscope-benchmarks/

Status And Credibility

This is an arXiv v1 preprint submitted on 2026-07-01 by Alexander Chemeris, Ming Jin, and Randall Balestriero. The arXiv record states that the paper was accepted by the 12th Mining and Learning from Time Series workshop at KDD MILETS 2026. The paper has public code repositories for both the generator library and benchmark harness, plus a public interactive dashboard for the current result snapshot.

Core Claim

Aionoscope is a generator-based diagnostic tool for testing whether frozen time-series representations expose latent process state, not only coarse signal identity or downstream task labels. It separates process generation from observation rendering so the benchmark can emit exact categorical and dense labels from the same latent state that produced the observed time series.

The headline diagnostic finding is a coarse-vs-dense mismatch: many evaluated systems make component presence easy to recover, but dense state variables such as timing, phase, amplitude, frequency, and regime parameters are much less reliably accessible.

Benchmark Contract

For benchmark configuration $c$ , seed $s$ , and mixture complexity $k$ , Aionoscope samples latent state and renders observations as:

z_{i} \sim P_{c} (\cdot ∣ k, s), x_{i} = V_{c} (z_{i}, ϵ_{i}), y_{i}^{cat} = C (z_{i}), y_{i}^{dense} = D (z_{i}) .

Frozen model-plus-adapter systems are then probed layer-wise with a common pooled linear readout. Dense metrics are masked so a parameter is scored only when the owning component is active.

flowchart LR
  Process[Process: latent components, events, regimes] --> State[exact latent state]
  State --> View[View: render observed time series]
  State --> Labels[categorical + dense labels]
  View --> Encoder[frozen model + adapter]
  Encoder --> Pool[per-layer pooled representation]
  Pool --> Probe[linear categorical and dense probes]
  Labels --> Metrics[AUROC / AUPRC / masked R2 / Pearson]
  Probe --> Metrics

Key Contributions

Introduces a Process-to-View diagnostic generator for time-series representation analysis.
Provides exact categorical labels and dense latent-state labels from the same generation state that renders the observed stream.
Instantiates the first benchmark as Primitive Process Mixtures, a single-channel synthetic stream family with 14 component labels and 34 dense generative parameters.
Evaluates 37 model-plus-adapter systems under a common native-length, layer-wise, pooled linear-probe protocol.
Separates generator/library artifacts from the benchmark harness and exposes an interactive dashboard for inspecting model, target, and layer behavior.

Evidence And Results

Coarse accessibility is easier: component presence is often recoverable by many systems under categorical AUROC/AUPRC probes.
Dense accessibility is harder: the highest observed dense-probe row reaches 0.689 mean masked $R^{2}$ , while the dense-feature oracle reaches 0.999.
Target difficulty is uneven: frequency, amplitude, and duty-cycle variables are often more recoverable than offsets, phase, Gaussian-pulse width, spike amplitude, and some event-timing variables.
Layer and adapter matter: rows are model-plus-adapter fingerprints at native input length, so the benchmark audits released systems as exposed rather than claiming a pure architecture ranking.
The paper explicitly treats the snapshot as diagnostic: it is a public validation-seed pilot, not a stable leaderboard, hidden test set, or real-task transfer claim.

Interpretation For The Wiki

Aionoscope directly supports the wiki’s latent-state time-series framing: standard forecasting, reconstruction, or classification scores can hide whether a representation exposes the state variables a user may want to inspect. The benchmark does not prove that a model will transfer to operations, medicine, finance, robotics, or observability, but it creates a controlled unit test for whether known latent variables remain linearly accessible from frozen representations.

The most important wiki use is as a benchmark-hygiene reminder: a representation can encode what kind of signal is present while still hiding where, with what magnitude, with what phase, under which regime, or with which dense parameter values.

Limitations

Primitive Process Mixtures is controlled, synthetic, single-channel, and sampled at 500 Hz; it does not cover the full diversity of real multivariate, irregular, event-stream, or operational domains.
Current results use public validation streams and should not be treated as a locked leaderboard after publication.
The primary readout is mean-pooled and linear. A failure means only that state was not recoverable under this readout, not that the information is absent.
Native input length is part of each model-plus-adapter fingerprint; the sweep is not a length-controlled architecture comparison.
The benchmark is diagnostic and hypothesis-generating. It does not establish deployment readiness, action-conditioned control utility, runtime, latency, or real-task transfer.

Foundation TSFM Relevance

Agenda slot	Verdict	Evidence	Missing pieces
Benchmark level	partially closes	Defines a reproducible diagnostic benchmark for latent-state accessibility, with exact categorical and dense labels and public artifacts.	Needs hidden or locked streams, more generator families, multivariate and irregular settings, and real-domain transfer checks.
Representation quality	partially closes	Separates coarse component identity from dense process-state accessibility and reports target-level/layer-level probes.	Need token-level, nonlinear, learned-pooling, and downstream-task follow-up to distinguish readout limits from absent state.
Latent-state prediction	warning	Shows why downstream labels or coarse signal presence do not prove accessible process state.	The benchmark tests representations; it is not itself a state-prediction model or action-conditioned world model.
Data diversity and long tail	partially closes	Seeded synthetic generation can sample mixture complexity and rare component combinations under controlled labels.	Need checks that synthetic diagnostic success predicts real rare-regime or operational utility.
Control and counterfactuals	insufficient evidence	No actions, control inputs, interventions, or counterfactual rollout are included in the current Primitive Process Mixtures snapshot.	Add typed synthetic actions/interventions or couple the generator to an interactive environment.

Links Into The Wiki

Open Questions

Which additional process/view families are needed before Aionoscope can test multivariate, irregular, graph, event-stream, or observability settings?
How should future hidden streams be governed so public validation results do not become an overfit leaderboard?
Which negative results disappear under token-level, nonlinear, or learned-pooling probes, and which reflect genuinely inaccessible state?
Can Aionoscope diagnostics predict real downstream transfer, rare-regime detection, anomaly interpretation, or action-conditioned decision quality?
What is the right manifold or geometry-level extension beyond linear readout recovery?

Alex Open Research Wiki

Explorer

Aionoscope: Debugging Latent-State Accessibility in Time-Series Representations

Aionoscope: Debugging Latent-State Accessibility in Time-Series Representations

Source

Status And Credibility

Core Claim

Benchmark Contract

Key Contributions

Evidence And Results

Interpretation For The Wiki

Limitations

Foundation TSFM Relevance

Links Into The Wiki

Open Questions

Graph View

Table of Contents

Backlinks