Source Pages
Curation Fields
Source pages use importance: landmark|important|normal|context and
read_status: read|skimmed|none. Agents MUST prioritize landmark and
important sources during search and synthesis.
Landmark Sources
Read
- chemeris-latent-state-time-series-2026.md - Alex’s landmark position source for why observation forecasting is too narrow and why time-series foundation models should optimize for useful internal state.
- context-is-key-2024.md - ServiceNow Context is Key benchmark showing that essential natural-language context can be required for accurate time-series forecasts.
- lecun-autonomous-machine-intelligence-2022.md - LeCun autonomous machine intelligence proposal centered on world models, intrinsic objectives, and hierarchical JEPA.
- pararnn-2025.md - Apple ParaRNN framework for parallel training of nonlinear RNNs at billion-parameter language-model scale.
Skimmed
- world-models-2018.md - Ha and Schmidhuber landmark source for VAE + MDN-RNN latent world models, controller training in learned dreams, and simulator-exploitation caveats.
Important Sources
Read
- beyond-language-modeling-2026.md - Controlled multimodal pretraining study using Transfusion, visual data, world modeling, and MoE scaling.
- bolmo-2025.md - Byteification method for converting subword LMs into competitive byte-level language models.
- cauker-2025.md - Synthetic causally coherent time-series generator for TSFM pretraining.
- chatts-2024.md - Synthetic-data-trained time-series MLLM for understanding and reasoning over multivariate series.
- conceptmoe-2026.md - MoE architecture that merges semantically similar tokens into concept representations.
- dinov3-2025.md - Scaled self-supervised vision foundation model with improved dense features.
- dragon-hatchling-2025.md - Pathway BDH / Dragon Hatchling source for sparse positive recurrent fast state, synapse-level probes, language/translation scaling, and a cautionary architecture narrative around brain-model and Sudoku claims.
- dynamic-fine-tuning-2025.md - Reward-rectified SFT method that links SFT and RL through implicit rewards and token-level gradient scaling.
- evolution-strategies-at-scale-2025.md - Full-parameter ES fine-tuning of billion-parameter LLMs as an RL alternative.
- florence-2-2023.md - Microsoft Florence-2 paper using FLD-5B and an iterative visual data engine to train a compact prompt-based generalist vision model.
- gemma-4-12b-2026.md - Google DeepMind production/open-weight release for an encoder-free multimodal 12B model with text, image, and audio inputs.
- h-net-2025.md - End-to-end hierarchical byte model with learned dynamic chunking.
- guillotine-regularization-2022.md - Layer-cutting analysis showing why SSL projectors can improve training while hiding worse downstream representations at the output.
- iclr-time-series-meta-analysis-2026.md - Local ICLR 2026 field-map source for time-series forecasting, representation learning, and physiology-heavy representation clusters.
- latent-variable-energy-based-models-2023.md - Lecture-note introduction to latent-variable energy-based models and H-JEPA.
- lejepa-2025.md - JEPA theory and SIGReg objective for Gaussian predictive representations.
- leworldmodel-2026.md - Stable end-to-end JEPA world model from pixels using next-embedding prediction and Gaussian regularization.
- mamba-2023.md - Selective state space model architecture for linear-time sequence modeling.
- mamba-2-2024.md - Structured state space duality framework and Mamba-2 architecture.
- mamba-3-2026.md - Mamba-family architecture adding exponential-trapezoidal discretization, complex state, and MIMO updates.
- moda-2026.md - Mixture-of-Depths Attention source for content-based retrieval over prior layer key/value memories and hardware-aware depth attention.
- natural-language-guidance-tts-2024.md - Scalable synthetic annotation method for natural-language-controlled high-fidelity text-to-speech.
- nepa-2025.md - Next-embedding predictive autoregression for visual self-supervised learning.
- synergy-2025.md - Tokenizer-free byte-level language model with learned abstraction routing.
- prism-hypothesis-2025.md - Spectral hypothesis unifying semantic and pixel encoders through frequency structure.
- armt-2024.md - Associative Recurrent Memory Transformer source for layerwise associative memory over RMT-style segments.
- rate-2023.md - ICLR 2026 RATE source for recurrent memory in offline RL trajectories.
- rmt-2022.md - NeurIPS 2022 Recurrent Memory Transformer source for segment-level memory tokens.
- timeomni-1-2026.md - Time-series reasoning suite and TimeOmni-1 model for complex temporal reasoning.
- timeomni-vl-2026.md - Vision-centric unified model for time-series understanding and generation.
- tuna-2-2026.md - Pixel-space unified multimodal model that removes pretrained vision encoders.
- u-cast-2025.md - HDTSF formulation, Time-HD benchmark, and U-Cast baseline for high-dimensional multivariate forecasting.
- flow-of-ranks-2025.md - Rank-structure analysis and compression recipe for time-series Transformers.
- vl-jepa-2025.md - Vision-language JEPA that predicts text embeddings instead of autoregressive tokens.
Skimmed
- act-2023.md - Action Chunking with Transformers source for continuous robot action chunks and temporal ensembling.
- agentic-world-modeling-2026.md - Survey and taxonomy source for L1 predictors, L2 simulators, L3 evolvers, and physical/digital/social/scientific law regimes.
- atlas-2025.md - ATLAS test-time memory module and DeepTransformers family for optimized long-context memorization.
- bittokens-2025.md - IEEE 754 bit-level single-token number encoding for language-model numeracy.
- boom-2025.md - Datadog BOOM observability metrics forecasting benchmark.
- charm-2025.md - Channel-description-conditioned JEPA embedding model for multivariate time series.
- convergent-evolution-number-representations-2026.md - Number-representation study separating universal Fourier-spectrum spikes from functionally usable modular geometry.
- compute-optimal-tokenization-2026.md - Meta FAIR / University of Washington scaling-law study arguing that tokenization changes should be compared in bytes per parameter rather than tokens per parameter.
- cookbook-self-supervised-learning-2023.md - Beginner-friendly survey and practical taxonomy of SSL methods, recipes, evaluation protocols, and implementation gotchas as of early 2023.
- cwm-2025.md - Meta FAIR Code World Model technical report for execution-trace and agentic-code action-observation training.
- diffusionblocks-2026.md - ICLR 2026 block-wise training framework from Sakana AI that turns residual networks into independently trainable diffusion-style denoising blocks.
- diffusion-policy-2023.md - Robotics source for denoising future continuous action trajectories in a receding-horizon visuomotor policy.
- ebt-2025.md - Energy-Based Transformer paper using learned compatibility scores and gradient-based candidate refinement for scalable learning and inference-time thinking.
- embedded-language-flows-2026.md - MIT ELF preprint showing continuous embedding-space flow matching for language generation, useful as text-side evidence for multimodal diffusion/flow substrates.
- eidos-2026.md - Time-series foundation model family trained through latent-space predictive learning and SiGLU point-wise scalar tokenization.
- elt-2026.md - Elastic Looped Transformer source for parameter-efficient visual generation, ILSD loop-boundary supervision, and any-time loop-count inference.
- fast-2025.md - Frequency-space action tokenization method for making continuous robot action chunks compatible with autoregressive VLAs.
- exploring-large-models-time-series-2024.md - Tsinghua/THUML historical overview of early large time-series models, Timer, AutoTimes, Timer-XL, and OpenLTM.
- flowstate-2025.md - SSM-based time-series foundation model with a functional basis decoder for sampling-rate-invariant forecasting.
- fone-2025.md - Fourier Number Embedding method for precise single-token number representations.
- gemini-robotics-1-5-2025.md - Google DeepMind robotics source for embodied reasoning, Motion Transfer, and hierarchical VLA action execution.
- genie-2024.md - Google DeepMind ICML 2024 source for learning action-controllable visual world models from unlabeled videos via latent actions.
- gr00t-n1-2025.md - NVIDIA humanoid VLA source with a VLM System 2 and DiT/flow-matching System 1 action module.
- gqt-2025.md - Graph Quantized Tokenizer source for learned discrete graph vocabularies before Transformer processing.
- graph-tokenization-2026.md - ICLR 2026 graph tokenizer using reversible graph serialization plus BPE for standard Transformers.
- graphgpt-2025.md - ICML 2025 Graph Eulerian Transformer source for reversible graph-to-sequence pretraining.
- graphormer-2021.md - Classic graph Transformer baseline using centrality, shortest-path, and edge attention biases.
- helix-2025.md - Figure AI technical writeup on a fast/slow humanoid VLA for continuous upper-body control.
- helix-02-2026.md - Figure AI follow-on writeup extending Helix to full-body humanoid loco-manipulation with S2/S1/S0 hierarchy.
- hierarchical-reasoning-model-2025.md - HRM recurrent fast/slow reasoning architecture for small-model puzzle and ARC-style tasks.
- hyperloop-transformers-2026.md - Looped Transformer with loop-level hyper-connections for parameter-efficient language modeling.
- language-models-need-sleep-2026.md - Sleep-time memory-consolidation method for SSM-attention hybrids that loops before KV-cache eviction.
- hidden-uniform-cluster-prior-2022.md - SSL analysis showing that volume-maximization and prototype methods can impose hidden uniform cluster priors that hurt long-tailed data.
- jepa-slow-features-2022.md - JEPA failure-mode analysis showing latent predictive objectives can focus on fixed slow distractors instead of action-relevant state.
- lejepa-identifiability-2026.md - LeJEPA identifiability theory proving Gaussian-latent state recovery up to rotation under OU-style assumptions, with author X narrative, project page, code, and Lean proof artifacts.
- learning-is-forgetting-2026.md - ICLR 2026 Information Bottleneck analysis of LLM training as lossy compression.
- llms-time-series-analysis-2024.md - Position paper on using LLM interfaces, modality switching, and question answering for time-series analysis.
- llms-use-fourier-features-addition-2024.md - Mechanistic analysis of Fourier features in pretrained LLM addition.
- mhc-2025.md - DeepSeek-AI constrained Hyper-Connections method for stable matrix-valued residual streams.
- octo-2024.md - Open-source generalist robot policy with Transformer backbone and diffusion action head.
- openvla-2024.md - Open action-token VLA model for image/language-conditioned robot control.
- pi0-2024.md - Physical Intelligence VLA flow model with a semantic VLM backbone and continuous action expert.
- pi0-7-2026.md - Steerable generalist VLA model using rich context, metadata, subgoal images, and a flow-matching action expert.
- raev2-2026.md - RAEv2 paper and X discussion on multi-layer representation autoencoders, REPA self-guidance, and action-conditioned navigation world-model rollouts.
- rdt-1b-2024.md - Robotics Diffusion Transformer source for scaled bimanual continuous action chunk generation.
- reconstruction-or-semantics-2026.md - Evaluation of reconstruction and semantic latent spaces for robotic diffusion world models.
- rt-2-2023.md - VLA action-as-language source showing web-scale VLM transfer to robot action tokens.
- scaling-law-time-series-forecasting-2024.md - Theory and experiments for scaling laws in time-series forecasting with look-back horizon as a scaling variable.
- scaling-laws-large-time-series-models-2024.md - Empirical power-law scaling evidence for decoder-only time-series foundation models.
- scaling-test-time-compute-agentic-coding-2026.md - Agentic-coding test-time scaling source showing structured rollout summaries outperform raw traces for selection and reuse.
- self-teaching-autoencoder-2026.md - Blog/code/demo source for transformed latent-consistency autoencoder training without direct image-space reconstruction loss.
- stable-worldmodel-2026.md - Platform source for reproducible JEPA/world-model research, standardized trajectory data handling, MPC solvers, and factor-of-variation evaluation.
- tabm-2024.md - MLP-based tabular deep-learning model with parameter-efficient ensembling and numerical feature embeddings.
- tiny-recursive-model-2025.md - TRM minimalist recursive reasoning model that simplifies HRM with a single tiny network.
- turboquant-2025.md - ICLR 2026 online vector quantization method for KV-cache and vector-search state; vLLM critique narrows production value to memory-pressure cases versus FP8.
- world-model-robot-learning-survey-2026.md - 2026 robot-learning world-model survey separating policy coupling, simulator/evaluator roles, robotic video generation, evaluation, datasets, and open challenges.
- training-in-imagination-2026.md - Theory source for training policies in learned world models with separate dynamics/reward errors, sample-budget allocation, and reward-noise versus reward-bias hygiene.
- time-hd-2025.md - Time-HD high-dimensional time-series forecasting benchmark introduced with U-Cast.
- titans-2025.md - Titans neural long-term memory architecture for learning to memorize context at test time.
- tokengt-2022.md - TokenGT source for treating graph nodes and edges as ordinary Transformer tokens.
- toto-2-tsalm-2026.md - TSALM @ ICLR 2026 presentation transcript and slides for Toto 2.0 scaling, training recipe, data mix, ARFBench, Toto-1.0-QA-Experimental, and observability world-model roadmap.
- universal-transformers-2018.md - Universal Transformer root source for recurrent-depth self-attention and adaptive per-position halting.
- universal-transformers-need-memory-2026.md - Study of memory tokens and ACT depth-state tradeoffs in Universal Transformer recursive reasoning.
Normal Sources
Read
- perception-encoder-2025.md - Meta Perception Encoder paper showing strong visual embeddings can be hidden in intermediate layers and exposed through alignment tuning.
- reinpatch-2026.md - Reinforcement-trained adaptive patcher for time-series forecasting and zero-shot patch-policy transfer.
Skimmed
- atst-2023.md - Audio Teacher-Student Transformer for clip-level and frame-level self-supervised audio representations.
- chronos-2-2025.md - Universal forecasting extension of Chronos with grouped time series, covariates, and cross-series in-context learning.
- fade-2026.md - FADE adaptive per-parameter weight-decay method for controlled forgetting in continual learning.
- fast-slow-training-2026.md - Fast-Slow Training method for LLM continual adaptation using prompt/context fast weights and parameter slow weights.
- kairos-2025.md - Adaptive time-series forecasting model family with benchmarked 10M, 23M, and 50M variants.
- mantis-2025.md - Lightweight calibrated foundation model for user-friendly time-series classification.
- mantisv2-2026.md - Synthetic-data and test-time-strategy extension of Mantis for zero-shot time-series classification.
- moirai-2-2025.md - Smaller Moirai 2.0 forecasting model emphasizing efficiency and calibration.
- moirai-2024.md - Universal time-series forecasting Transformer family trained across heterogeneous series.
- moirai-moe-2024.md - Sparse mixture-of-experts extension of Moirai for time-series forecasting.
- molmo-pixmo-2024.md - Open-weight and open-data VLM family and data engine from Allen AI.
- moment-2024.md - Open time-series foundation-model family for forecasting, classification, and representation learning.
- nutime-2023.md - Numerically multi-scaled embedding method for large-scale time-series pretraining.
- pretrained-transformers-universal-computation-engines-2021.md - Frozen language-pretrained Transformers transferred to non-language sequence tasks.
- reverso-2026.md - Efficient zero-shot forecasting model centered on compact recurrent-style sequence modeling.
- rwkv-ts-2024.md - RWKV-style recurrent backbone adapted to time-series forecasting and related passive tasks.
- simmtm-2023.md - Multi-neighbor masked time-series modeling framework for forecasting and classification pretraining.
- stochastic-sharpness-gap-2026.md - SGD edge-of-stability theory source explaining batch-size-dependent sharpness gaps through projected gradient-noise variance.
- sundial-2025.md - THUML time-series foundation-model family for forecasting across heterogeneous tasks.
- t2s-2025.md - Text-to-time-series generation model using LA-VAE and flow-matching Diffusion Transformer.
- t-loss-2019.md - Scalable unsupervised representation learning baseline for multivariate time series.
- tabicl-2025.md - Tabular in-context learning model that scales row-wise context beyond small-data TabPFN settings.
- tabpfn-3-2026.md - Prior Labs technical report for TabPFN-3, its Thinking/API variants, and TabPFN-TS-3.
- tabpfn-v2-2025.md - Tabular prior-data fitted network for fast small-data classification and regression.
- telecomts-2025.md - Multimodal 5G observability dataset with scale-preserving KPI time series, anomaly/root-cause labels, and language Q&A fields.
- tempopfn-2025.md - Synthetic-pretrained linear RNN prior-data fitted network for zero-shot forecasting.
- time-moe-2024.md - Billion-scale mixture-of-experts time-series foundation-model family.
- timer-2024.md - Generative pretrained Transformer line framing time-series forecasting as large sequence modeling.
- timesfm-2023.md - Decoder-only forecasting foundation model from Google Research.
- tiny-time-mixers-2024.md - Compact pretrained MLP-mixer forecasting models for zero-shot and few-shot use.
- tirex-2025.md - Zero-shot forecasting model using enhanced in-context learning across short and long horizons.
- tivit-2025.md - Time-series classification via frozen vision-model hidden representations.
- ts2vec-2021.md - Hierarchical contrastive time-series representation learning with timestamp-level embeddings.
- tsmixer-2023.md - All-MLP time-series forecasting architecture that mixes over time and feature dimensions.
- toto-2-2026.md - Datadog article announcing the Toto 2.0 open-weights forecasting model family and scaling results.
- toto-2025.md - Observability-oriented time-series foundation model from Datadog.
- unitime-2023.md - Early language-instruction-conditioned cross-domain time-series forecasting model.
- units-2024.md - Unified multi-task time-series model using task tokenization and shared weights.
- unishape-2026.md - Shape-aware foundation model for time-series classification.
- utica-2026.md - Multi-objective self-distillation pretraining method for time-series classification.
- wavspa-2022.md - Wavelet-space attention method for long-sequence Transformers.
Not Read
- anomod-2026.md - Multimodal microservice anomaly-detection and root-cause-analysis dataset with logs, metrics, traces, API responses, and code coverage.
- chronograph-2025.md - Graph-structured multivariate microservice time-series dataset with temporal node/edge features and incident labels.
- huginn-2025.md - Recurrent-depth language model scaling test-time compute through latent reasoning loops.
- latent-thoughts-2025.md - Looped Transformer reasoning source connecting repeated depth to latent thoughts.
- loopformer-2026.md - Elastic-depth looped Transformer trained for budget-conditioned latent reasoning.
- mesanet-2025.md - Mesa layer sequence model using locally optimal test-time training with conjugate-gradient updates.
- miras-2025.md - Associative-memory framework for test-time memorization, attentional bias, retention, and online optimization.
- parallel-samplers-recurrent-depth-2025.md - Parallel sampler connecting recurrent-depth models to diffusion language models.
- parcae-2026.md - Stable looped language-model architecture with scaling-law analysis.
- recurrent-transformer-2026.md - Transformer variant with layerwise recurrent memory for greater effective depth and efficient decoding.
- sparse-layers-looped-language-models-2026.md - Looped-MoE scaling and early-exit source for looped language models.
- titans-revisited-2025.md - Lightweight Titans reimplementation and critical analysis across language, time-series, and recommendation tasks.
- universal-reasoning-model-2025.md - UT-derived recursive reasoning model for ARC-AGI and Sudoku-style tasks.
- gaia-micross-2021.md - GAIA AIOps dataset collection with MicroSS metrics, traces, logs, and anomaly-injection records.
- gift-eval-2024.md - Salesforce GIFT-Eval general time-series forecasting benchmark and leaderboard.
- lemma-rca-2024.md - Large multi-modal multi-domain root-cause-analysis dataset collection spanning IT and OT operations.
- openrca-2025.md - LLM-agent root-cause-analysis benchmark over natural-language queries, KPI time series, trace graphs, and logs.
- ops-lite-2026.md - Compact RCA evaluation set with per-case causal-graph ground truth for microservice systems.
- rcaeval-2025.md - Microservice RCA benchmark and evaluation framework with RE1/RE2/RE3 datasets and reproducible baselines.
- time-2026.md - TIME contamination-resistant zero-shot forecasting benchmark.
- learning-from-leading-indicators-2024.md - LIFT plugin for local lead-lag channel dependence in multivariate forecasting.
- t-rep-2023.md - Self-supervised timestep-level time-series representation learning with learned time-embeddings.
- time-series-forecasting-manifold-learning-2021.md - Embed-predict-lift manifold-learning approach for high-dimensional time-series forecasting.
- evolution-strategies-at-the-hyperscale-2025.md - EGGROLL low-rank perturbation method for hyperscale ES.
- evolution-strategies-scalable-alternative-2017.md - OpenAI ES baseline showing scalable black-box policy optimization.
- evolutionary-strategies-catastrophic-forgetting-2026.md - Catastrophic-forgetting stress test for ES-based LLM fine-tuning.
Context Sources
Not Read
- yahoo-contextual-bandit-2010.md - Yahoo! news recommendation contextual-bandit logs and evaluation method.
- amsterdamumcdb-2021.md - European ICU database with longitudinal observations, medications, fluids, and procedures.
- assistments-2009.md - ASSISTments student interaction data with hints, attempts, and tutoring-event sequences.
- causalworld-2020.md - Robotic manipulation benchmark for causal structure and transfer learning.
- criteo-uplift-2018.md - Marketing treatment/control dataset for uplift and treatment-effect modeling.
- d4rl-2020.md - Offline RL benchmark suite of state-action-reward trajectories.
- ednet-2019.md - Large-scale hierarchical student activity sequence dataset.
- eicu-crd-2018.md - Multi-center ICU database with longitudinal treatments and observations.
- heartsteps-2019.md - Mobile-health micro-randomized intervention data for activity suggestions.
- hirid-2020.md - High-resolution ICU time-series dataset with treatment/event records.
- kdd-cup-2010.md - Student-performance prediction dataset from intelligent tutoring logs.
- kuairand-2022.md - Sequential recommendation dataset with randomly exposed videos.
- mimic-iv-2023.md - Clinical EHR/ICU database with longitudinal measurements, orders, procedures, and treatments.
- ohio-t1dm-2018.md - Type-1 diabetes longitudinal glucose, insulin, meal, and activity dataset.
- open-bandit-dataset-2020.md - Logged bandit feedback dataset and pipeline for off-policy evaluation.
- pslc-datashop-2010.md - Learning-science repository with student/tutor event logs.
- rl-unplugged-2020.md - Offline RL benchmark suite built from logged transitions.
- causal-chambers-2024.md - Real physical systems with known causal structure and interventional data.
- bridge-data-v2-2023.md - Real-robot manipulation dataset used for language-conditioned policies and robotic world-model evaluation.
- droid-2024.md - In-the-wild robot manipulation dataset with synchronized visual observations and language annotations.
- open-x-embodiment-2023.md - Multi-embodiment robot-learning dataset and RT-X model source.
- roboturk-2018.md - Crowdsourced 6-DoF teleoperation platform and manipulation demonstration dataset.
- time-series-library-2024.md - THUML Time-Series-Library benchmark collection used as the LSF/LTSF handle.