Aionoscope: Debugging Latent-State Accessibility in Time-Series Representations
Source
- Raw Markdown: paper_aionoscope-2026.md
- PDF: paper_aionoscope-2026.pdf
- Preprint: https://arxiv.org/abs/2607.00956
- DOI: https://doi.org/10.48550/arXiv.2607.00956
- Interactive results: https://aionoscope.langotime.ai/
- Official generator/library code: https://github.com/langotime/aionoscope/
- Official benchmark code: https://github.com/langotime/aionoscope-benchmarks/
Status And Credibility
This is an arXiv v1 preprint submitted on 2026-07-01 by Alexander Chemeris, Ming Jin, and Randall Balestriero. The arXiv record states that the paper was accepted by the 12th Mining and Learning from Time Series workshop at KDD MILETS 2026. The paper has public code repositories for both the generator library and benchmark harness, plus a public interactive dashboard for the current result snapshot.
Core Claim
Aionoscope is a generator-based diagnostic tool for testing whether frozen time-series representations expose latent process state, not only coarse signal identity or downstream task labels. It separates process generation from observation rendering so the benchmark can emit exact categorical and dense labels from the same latent state that produced the observed time series.
The headline diagnostic finding is a coarse-vs-dense mismatch: many evaluated systems make component presence easy to recover, but dense state variables such as timing, phase, amplitude, frequency, and regime parameters are much less reliably accessible.
Benchmark Contract
For benchmark configuration , seed , and mixture complexity , Aionoscope samples latent state and renders observations as:
Frozen model-plus-adapter systems are then probed layer-wise with a common pooled linear readout. Dense metrics are masked so a parameter is scored only when the owning component is active.
flowchart LR Process[Process: latent components, events, regimes] --> State[exact latent state] State --> View[View: render observed time series] State --> Labels[categorical + dense labels] View --> Encoder[frozen model + adapter] Encoder --> Pool[per-layer pooled representation] Pool --> Probe[linear categorical and dense probes] Labels --> Metrics[AUROC / AUPRC / masked R2 / Pearson] Probe --> Metrics
Key Contributions
- Introduces a Process-to-View diagnostic generator for time-series representation analysis.
- Provides exact categorical labels and dense latent-state labels from the same generation state that renders the observed stream.
- Instantiates the first benchmark as Primitive Process Mixtures, a single-channel synthetic stream family with 14 component labels and 34 dense generative parameters.
- Evaluates 37 model-plus-adapter systems under a common native-length, layer-wise, pooled linear-probe protocol.
- Separates generator/library artifacts from the benchmark harness and exposes an interactive dashboard for inspecting model, target, and layer behavior.
Evidence And Results
- Coarse accessibility is easier: component presence is often recoverable by many systems under categorical AUROC/AUPRC probes.
- Dense accessibility is harder: the highest observed dense-probe row reaches 0.689 mean masked , while the dense-feature oracle reaches 0.999.
- Target difficulty is uneven: frequency, amplitude, and duty-cycle variables are often more recoverable than offsets, phase, Gaussian-pulse width, spike amplitude, and some event-timing variables.
- Layer and adapter matter: rows are model-plus-adapter fingerprints at native input length, so the benchmark audits released systems as exposed rather than claiming a pure architecture ranking.
- The paper explicitly treats the snapshot as diagnostic: it is a public validation-seed pilot, not a stable leaderboard, hidden test set, or real-task transfer claim.
Interpretation For The Wiki
Aionoscope directly supports the wiki’s latent-state time-series framing: standard forecasting, reconstruction, or classification scores can hide whether a representation exposes the state variables a user may want to inspect. The benchmark does not prove that a model will transfer to operations, medicine, finance, robotics, or observability, but it creates a controlled unit test for whether known latent variables remain linearly accessible from frozen representations.
The most important wiki use is as a benchmark-hygiene reminder: a representation can encode what kind of signal is present while still hiding where, with what magnitude, with what phase, under which regime, or with which dense parameter values.
Limitations
- Primitive Process Mixtures is controlled, synthetic, single-channel, and sampled at 500 Hz; it does not cover the full diversity of real multivariate, irregular, event-stream, or operational domains.
- Current results use public validation streams and should not be treated as a locked leaderboard after publication.
- The primary readout is mean-pooled and linear. A failure means only that state was not recoverable under this readout, not that the information is absent.
- Native input length is part of each model-plus-adapter fingerprint; the sweep is not a length-controlled architecture comparison.
- The benchmark is diagnostic and hypothesis-generating. It does not establish deployment readiness, action-conditioned control utility, runtime, latency, or real-task transfer.
Foundation TSFM Relevance
| Agenda slot | Verdict | Evidence | Missing pieces |
|---|---|---|---|
| Benchmark level | partially closes | Defines a reproducible diagnostic benchmark for latent-state accessibility, with exact categorical and dense labels and public artifacts. | Needs hidden or locked streams, more generator families, multivariate and irregular settings, and real-domain transfer checks. |
| Representation quality | partially closes | Separates coarse component identity from dense process-state accessibility and reports target-level/layer-level probes. | Need token-level, nonlinear, learned-pooling, and downstream-task follow-up to distinguish readout limits from absent state. |
| Latent-state prediction | warning | Shows why downstream labels or coarse signal presence do not prove accessible process state. | The benchmark tests representations; it is not itself a state-prediction model or action-conditioned world model. |
| Data diversity and long tail | partially closes | Seeded synthetic generation can sample mixture complexity and rare component combinations under controlled labels. | Need checks that synthetic diagnostic success predicts real rare-regime or operational utility. |
| Control and counterfactuals | insufficient evidence | No actions, control inputs, interventions, or counterfactual rollout are included in the current Primitive Process Mixtures snapshot. | Add typed synthetic actions/interventions or couple the generator to an interactive environment. |
Links Into The Wiki
- Aionoscope
- Aionoscope Manifold Reconstruction Benchmark
- Latent-State Time-Series Modeling
- Time-Series Benchmark Hygiene
- Synthetic Data For Time Series
- Self-Supervised Representation Learning
- Time-Series Classification Foundation Models
- Foundation Time-Series Model Research Agenda
Open Questions
- Which additional process/view families are needed before Aionoscope can test multivariate, irregular, graph, event-stream, or observability settings?
- How should future hidden streams be governed so public validation results do not become an overfit leaderboard?
- Which negative results disappear under token-level, nonlinear, or learned-pooling probes, and which reflect genuinely inaccessible state?
- Can Aionoscope diagnostics predict real downstream transfer, rare-regime detection, anomaly interpretation, or action-conditioned decision quality?
- What is the right manifold or geometry-level extension beyond linear readout recovery?