Source Pages

Curation Fields

Source pages use importance: landmark|important|normal|context and read_status: read|skimmed|none. Agents MUST prioritize landmark and important sources during search and synthesis.

Landmark Sources

Read

  • chemeris-latent-state-time-series-2026.md - Alex’s landmark position source for why observation forecasting is too narrow and why time-series foundation models should optimize for useful internal state.
  • context-is-key-2024.md - ServiceNow Context is Key benchmark showing that essential natural-language context can be required for accurate time-series forecasts.
  • lecun-autonomous-machine-intelligence-2022.md - LeCun autonomous machine intelligence proposal centered on world models, intrinsic objectives, and hierarchical JEPA.
  • pararnn-2025.md - Apple ParaRNN framework for parallel training of nonlinear RNNs at billion-parameter language-model scale.

Skimmed

  • world-models-2018.md - Ha and Schmidhuber landmark source for VAE + MDN-RNN latent world models, controller training in learned dreams, and simulator-exploitation caveats.

Important Sources

Read

  • beyond-language-modeling-2026.md - Controlled multimodal pretraining study using Transfusion, visual data, world modeling, and MoE scaling.
  • bolmo-2025.md - Byteification method for converting subword LMs into competitive byte-level language models.
  • cauker-2025.md - Synthetic causally coherent time-series generator for TSFM pretraining.
  • chatts-2024.md - Synthetic-data-trained time-series MLLM for understanding and reasoning over multivariate series.
  • conceptmoe-2026.md - MoE architecture that merges semantically similar tokens into concept representations.
  • dinov3-2025.md - Scaled self-supervised vision foundation model with improved dense features.
  • dragon-hatchling-2025.md - Pathway BDH / Dragon Hatchling source for sparse positive recurrent fast state, synapse-level probes, language/translation scaling, and a cautionary architecture narrative around brain-model and Sudoku claims.
  • dynamic-fine-tuning-2025.md - Reward-rectified SFT method that links SFT and RL through implicit rewards and token-level gradient scaling.
  • evolution-strategies-at-scale-2025.md - Full-parameter ES fine-tuning of billion-parameter LLMs as an RL alternative.
  • florence-2-2023.md - Microsoft Florence-2 paper using FLD-5B and an iterative visual data engine to train a compact prompt-based generalist vision model.
  • gemma-4-12b-2026.md - Google DeepMind production/open-weight release for an encoder-free multimodal 12B model with text, image, and audio inputs.
  • h-net-2025.md - End-to-end hierarchical byte model with learned dynamic chunking.
  • guillotine-regularization-2022.md - Layer-cutting analysis showing why SSL projectors can improve training while hiding worse downstream representations at the output.
  • iclr-time-series-meta-analysis-2026.md - Local ICLR 2026 field-map source for time-series forecasting, representation learning, and physiology-heavy representation clusters.
  • latent-variable-energy-based-models-2023.md - Lecture-note introduction to latent-variable energy-based models and H-JEPA.
  • lejepa-2025.md - JEPA theory and SIGReg objective for Gaussian predictive representations.
  • leworldmodel-2026.md - Stable end-to-end JEPA world model from pixels using next-embedding prediction and Gaussian regularization.
  • mamba-2023.md - Selective state space model architecture for linear-time sequence modeling.
  • mamba-2-2024.md - Structured state space duality framework and Mamba-2 architecture.
  • mamba-3-2026.md - Mamba-family architecture adding exponential-trapezoidal discretization, complex state, and MIMO updates.
  • moda-2026.md - Mixture-of-Depths Attention source for content-based retrieval over prior layer key/value memories and hardware-aware depth attention.
  • natural-language-guidance-tts-2024.md - Scalable synthetic annotation method for natural-language-controlled high-fidelity text-to-speech.
  • nepa-2025.md - Next-embedding predictive autoregression for visual self-supervised learning.
  • synergy-2025.md - Tokenizer-free byte-level language model with learned abstraction routing.
  • prism-hypothesis-2025.md - Spectral hypothesis unifying semantic and pixel encoders through frequency structure.
  • armt-2024.md - Associative Recurrent Memory Transformer source for layerwise associative memory over RMT-style segments.
  • rate-2023.md - ICLR 2026 RATE source for recurrent memory in offline RL trajectories.
  • rmt-2022.md - NeurIPS 2022 Recurrent Memory Transformer source for segment-level memory tokens.
  • timeomni-1-2026.md - Time-series reasoning suite and TimeOmni-1 model for complex temporal reasoning.
  • timeomni-vl-2026.md - Vision-centric unified model for time-series understanding and generation.
  • tuna-2-2026.md - Pixel-space unified multimodal model that removes pretrained vision encoders.
  • u-cast-2025.md - HDTSF formulation, Time-HD benchmark, and U-Cast baseline for high-dimensional multivariate forecasting.
  • flow-of-ranks-2025.md - Rank-structure analysis and compression recipe for time-series Transformers.
  • vl-jepa-2025.md - Vision-language JEPA that predicts text embeddings instead of autoregressive tokens.

Skimmed

  • act-2023.md - Action Chunking with Transformers source for continuous robot action chunks and temporal ensembling.
  • agentic-world-modeling-2026.md - Survey and taxonomy source for L1 predictors, L2 simulators, L3 evolvers, and physical/digital/social/scientific law regimes.
  • atlas-2025.md - ATLAS test-time memory module and DeepTransformers family for optimized long-context memorization.
  • bittokens-2025.md - IEEE 754 bit-level single-token number encoding for language-model numeracy.
  • boom-2025.md - Datadog BOOM observability metrics forecasting benchmark.
  • charm-2025.md - Channel-description-conditioned JEPA embedding model for multivariate time series.
  • convergent-evolution-number-representations-2026.md - Number-representation study separating universal Fourier-spectrum spikes from functionally usable modular geometry.
  • compute-optimal-tokenization-2026.md - Meta FAIR / University of Washington scaling-law study arguing that tokenization changes should be compared in bytes per parameter rather than tokens per parameter.
  • cookbook-self-supervised-learning-2023.md - Beginner-friendly survey and practical taxonomy of SSL methods, recipes, evaluation protocols, and implementation gotchas as of early 2023.
  • cwm-2025.md - Meta FAIR Code World Model technical report for execution-trace and agentic-code action-observation training.
  • diffusionblocks-2026.md - ICLR 2026 block-wise training framework from Sakana AI that turns residual networks into independently trainable diffusion-style denoising blocks.
  • diffusion-policy-2023.md - Robotics source for denoising future continuous action trajectories in a receding-horizon visuomotor policy.
  • ebt-2025.md - Energy-Based Transformer paper using learned compatibility scores and gradient-based candidate refinement for scalable learning and inference-time thinking.
  • embedded-language-flows-2026.md - MIT ELF preprint showing continuous embedding-space flow matching for language generation, useful as text-side evidence for multimodal diffusion/flow substrates.
  • eidos-2026.md - Time-series foundation model family trained through latent-space predictive learning and SiGLU point-wise scalar tokenization.
  • elt-2026.md - Elastic Looped Transformer source for parameter-efficient visual generation, ILSD loop-boundary supervision, and any-time loop-count inference.
  • fast-2025.md - Frequency-space action tokenization method for making continuous robot action chunks compatible with autoregressive VLAs.
  • exploring-large-models-time-series-2024.md - Tsinghua/THUML historical overview of early large time-series models, Timer, AutoTimes, Timer-XL, and OpenLTM.
  • flowstate-2025.md - SSM-based time-series foundation model with a functional basis decoder for sampling-rate-invariant forecasting.
  • fone-2025.md - Fourier Number Embedding method for precise single-token number representations.
  • gemini-robotics-1-5-2025.md - Google DeepMind robotics source for embodied reasoning, Motion Transfer, and hierarchical VLA action execution.
  • genie-2024.md - Google DeepMind ICML 2024 source for learning action-controllable visual world models from unlabeled videos via latent actions.
  • gr00t-n1-2025.md - NVIDIA humanoid VLA source with a VLM System 2 and DiT/flow-matching System 1 action module.
  • gqt-2025.md - Graph Quantized Tokenizer source for learned discrete graph vocabularies before Transformer processing.
  • graph-tokenization-2026.md - ICLR 2026 graph tokenizer using reversible graph serialization plus BPE for standard Transformers.
  • graphgpt-2025.md - ICML 2025 Graph Eulerian Transformer source for reversible graph-to-sequence pretraining.
  • graphormer-2021.md - Classic graph Transformer baseline using centrality, shortest-path, and edge attention biases.
  • helix-2025.md - Figure AI technical writeup on a fast/slow humanoid VLA for continuous upper-body control.
  • helix-02-2026.md - Figure AI follow-on writeup extending Helix to full-body humanoid loco-manipulation with S2/S1/S0 hierarchy.
  • hierarchical-reasoning-model-2025.md - HRM recurrent fast/slow reasoning architecture for small-model puzzle and ARC-style tasks.
  • hyperloop-transformers-2026.md - Looped Transformer with loop-level hyper-connections for parameter-efficient language modeling.
  • language-models-need-sleep-2026.md - Sleep-time memory-consolidation method for SSM-attention hybrids that loops before KV-cache eviction.
  • hidden-uniform-cluster-prior-2022.md - SSL analysis showing that volume-maximization and prototype methods can impose hidden uniform cluster priors that hurt long-tailed data.
  • jepa-slow-features-2022.md - JEPA failure-mode analysis showing latent predictive objectives can focus on fixed slow distractors instead of action-relevant state.
  • lejepa-identifiability-2026.md - LeJEPA identifiability theory proving Gaussian-latent state recovery up to rotation under OU-style assumptions, with author X narrative, project page, code, and Lean proof artifacts.
  • learning-is-forgetting-2026.md - ICLR 2026 Information Bottleneck analysis of LLM training as lossy compression.
  • llms-time-series-analysis-2024.md - Position paper on using LLM interfaces, modality switching, and question answering for time-series analysis.
  • llms-use-fourier-features-addition-2024.md - Mechanistic analysis of Fourier features in pretrained LLM addition.
  • mhc-2025.md - DeepSeek-AI constrained Hyper-Connections method for stable matrix-valued residual streams.
  • octo-2024.md - Open-source generalist robot policy with Transformer backbone and diffusion action head.
  • openvla-2024.md - Open action-token VLA model for image/language-conditioned robot control.
  • pi0-2024.md - Physical Intelligence VLA flow model with a semantic VLM backbone and continuous action expert.
  • pi0-7-2026.md - Steerable generalist VLA model using rich context, metadata, subgoal images, and a flow-matching action expert.
  • raev2-2026.md - RAEv2 paper and X discussion on multi-layer representation autoencoders, REPA self-guidance, and action-conditioned navigation world-model rollouts.
  • rdt-1b-2024.md - Robotics Diffusion Transformer source for scaled bimanual continuous action chunk generation.
  • reconstruction-or-semantics-2026.md - Evaluation of reconstruction and semantic latent spaces for robotic diffusion world models.
  • rt-2-2023.md - VLA action-as-language source showing web-scale VLM transfer to robot action tokens.
  • scaling-law-time-series-forecasting-2024.md - Theory and experiments for scaling laws in time-series forecasting with look-back horizon as a scaling variable.
  • scaling-laws-large-time-series-models-2024.md - Empirical power-law scaling evidence for decoder-only time-series foundation models.
  • scaling-test-time-compute-agentic-coding-2026.md - Agentic-coding test-time scaling source showing structured rollout summaries outperform raw traces for selection and reuse.
  • self-teaching-autoencoder-2026.md - Blog/code/demo source for transformed latent-consistency autoencoder training without direct image-space reconstruction loss.
  • stable-worldmodel-2026.md - Platform source for reproducible JEPA/world-model research, standardized trajectory data handling, MPC solvers, and factor-of-variation evaluation.
  • tabm-2024.md - MLP-based tabular deep-learning model with parameter-efficient ensembling and numerical feature embeddings.
  • tiny-recursive-model-2025.md - TRM minimalist recursive reasoning model that simplifies HRM with a single tiny network.
  • turboquant-2025.md - ICLR 2026 online vector quantization method for KV-cache and vector-search state; vLLM critique narrows production value to memory-pressure cases versus FP8.
  • world-model-robot-learning-survey-2026.md - 2026 robot-learning world-model survey separating policy coupling, simulator/evaluator roles, robotic video generation, evaluation, datasets, and open challenges.
  • training-in-imagination-2026.md - Theory source for training policies in learned world models with separate dynamics/reward errors, sample-budget allocation, and reward-noise versus reward-bias hygiene.
  • time-hd-2025.md - Time-HD high-dimensional time-series forecasting benchmark introduced with U-Cast.
  • titans-2025.md - Titans neural long-term memory architecture for learning to memorize context at test time.
  • tokengt-2022.md - TokenGT source for treating graph nodes and edges as ordinary Transformer tokens.
  • toto-2-tsalm-2026.md - TSALM @ ICLR 2026 presentation transcript and slides for Toto 2.0 scaling, training recipe, data mix, ARFBench, Toto-1.0-QA-Experimental, and observability world-model roadmap.
  • universal-transformers-2018.md - Universal Transformer root source for recurrent-depth self-attention and adaptive per-position halting.
  • universal-transformers-need-memory-2026.md - Study of memory tokens and ACT depth-state tradeoffs in Universal Transformer recursive reasoning.

Normal Sources

Read

  • perception-encoder-2025.md - Meta Perception Encoder paper showing strong visual embeddings can be hidden in intermediate layers and exposed through alignment tuning.
  • reinpatch-2026.md - Reinforcement-trained adaptive patcher for time-series forecasting and zero-shot patch-policy transfer.

Skimmed

  • atst-2023.md - Audio Teacher-Student Transformer for clip-level and frame-level self-supervised audio representations.
  • chronos-2-2025.md - Universal forecasting extension of Chronos with grouped time series, covariates, and cross-series in-context learning.
  • fade-2026.md - FADE adaptive per-parameter weight-decay method for controlled forgetting in continual learning.
  • fast-slow-training-2026.md - Fast-Slow Training method for LLM continual adaptation using prompt/context fast weights and parameter slow weights.
  • kairos-2025.md - Adaptive time-series forecasting model family with benchmarked 10M, 23M, and 50M variants.
  • mantis-2025.md - Lightweight calibrated foundation model for user-friendly time-series classification.
  • mantisv2-2026.md - Synthetic-data and test-time-strategy extension of Mantis for zero-shot time-series classification.
  • moirai-2-2025.md - Smaller Moirai 2.0 forecasting model emphasizing efficiency and calibration.
  • moirai-2024.md - Universal time-series forecasting Transformer family trained across heterogeneous series.
  • moirai-moe-2024.md - Sparse mixture-of-experts extension of Moirai for time-series forecasting.
  • molmo-pixmo-2024.md - Open-weight and open-data VLM family and data engine from Allen AI.
  • moment-2024.md - Open time-series foundation-model family for forecasting, classification, and representation learning.
  • nutime-2023.md - Numerically multi-scaled embedding method for large-scale time-series pretraining.
  • pretrained-transformers-universal-computation-engines-2021.md - Frozen language-pretrained Transformers transferred to non-language sequence tasks.
  • reverso-2026.md - Efficient zero-shot forecasting model centered on compact recurrent-style sequence modeling.
  • rwkv-ts-2024.md - RWKV-style recurrent backbone adapted to time-series forecasting and related passive tasks.
  • simmtm-2023.md - Multi-neighbor masked time-series modeling framework for forecasting and classification pretraining.
  • stochastic-sharpness-gap-2026.md - SGD edge-of-stability theory source explaining batch-size-dependent sharpness gaps through projected gradient-noise variance.
  • sundial-2025.md - THUML time-series foundation-model family for forecasting across heterogeneous tasks.
  • t2s-2025.md - Text-to-time-series generation model using LA-VAE and flow-matching Diffusion Transformer.
  • t-loss-2019.md - Scalable unsupervised representation learning baseline for multivariate time series.
  • tabicl-2025.md - Tabular in-context learning model that scales row-wise context beyond small-data TabPFN settings.
  • tabpfn-3-2026.md - Prior Labs technical report for TabPFN-3, its Thinking/API variants, and TabPFN-TS-3.
  • tabpfn-v2-2025.md - Tabular prior-data fitted network for fast small-data classification and regression.
  • telecomts-2025.md - Multimodal 5G observability dataset with scale-preserving KPI time series, anomaly/root-cause labels, and language Q&A fields.
  • tempopfn-2025.md - Synthetic-pretrained linear RNN prior-data fitted network for zero-shot forecasting.
  • time-moe-2024.md - Billion-scale mixture-of-experts time-series foundation-model family.
  • timer-2024.md - Generative pretrained Transformer line framing time-series forecasting as large sequence modeling.
  • timesfm-2023.md - Decoder-only forecasting foundation model from Google Research.
  • tiny-time-mixers-2024.md - Compact pretrained MLP-mixer forecasting models for zero-shot and few-shot use.
  • tirex-2025.md - Zero-shot forecasting model using enhanced in-context learning across short and long horizons.
  • tivit-2025.md - Time-series classification via frozen vision-model hidden representations.
  • ts2vec-2021.md - Hierarchical contrastive time-series representation learning with timestamp-level embeddings.
  • tsmixer-2023.md - All-MLP time-series forecasting architecture that mixes over time and feature dimensions.
  • toto-2-2026.md - Datadog article announcing the Toto 2.0 open-weights forecasting model family and scaling results.
  • toto-2025.md - Observability-oriented time-series foundation model from Datadog.
  • unitime-2023.md - Early language-instruction-conditioned cross-domain time-series forecasting model.
  • units-2024.md - Unified multi-task time-series model using task tokenization and shared weights.
  • unishape-2026.md - Shape-aware foundation model for time-series classification.
  • utica-2026.md - Multi-objective self-distillation pretraining method for time-series classification.
  • wavspa-2022.md - Wavelet-space attention method for long-sequence Transformers.

Not Read

Context Sources

Not Read

  • yahoo-contextual-bandit-2010.md - Yahoo! news recommendation contextual-bandit logs and evaluation method.
  • amsterdamumcdb-2021.md - European ICU database with longitudinal observations, medications, fluids, and procedures.
  • assistments-2009.md - ASSISTments student interaction data with hints, attempts, and tutoring-event sequences.
  • causalworld-2020.md - Robotic manipulation benchmark for causal structure and transfer learning.
  • criteo-uplift-2018.md - Marketing treatment/control dataset for uplift and treatment-effect modeling.
  • d4rl-2020.md - Offline RL benchmark suite of state-action-reward trajectories.
  • ednet-2019.md - Large-scale hierarchical student activity sequence dataset.
  • eicu-crd-2018.md - Multi-center ICU database with longitudinal treatments and observations.
  • heartsteps-2019.md - Mobile-health micro-randomized intervention data for activity suggestions.
  • hirid-2020.md - High-resolution ICU time-series dataset with treatment/event records.
  • kdd-cup-2010.md - Student-performance prediction dataset from intelligent tutoring logs.
  • kuairand-2022.md - Sequential recommendation dataset with randomly exposed videos.
  • mimic-iv-2023.md - Clinical EHR/ICU database with longitudinal measurements, orders, procedures, and treatments.
  • ohio-t1dm-2018.md - Type-1 diabetes longitudinal glucose, insulin, meal, and activity dataset.
  • open-bandit-dataset-2020.md - Logged bandit feedback dataset and pipeline for off-policy evaluation.
  • pslc-datashop-2010.md - Learning-science repository with student/tutor event logs.
  • rl-unplugged-2020.md - Offline RL benchmark suite built from logged transitions.
  • causal-chambers-2024.md - Real physical systems with known causal structure and interventional data.
  • bridge-data-v2-2023.md - Real-robot manipulation dataset used for language-conditioned policies and robotic world-model evaluation.
  • droid-2024.md - In-the-wild robot manipulation dataset with synchronized visual observations and language annotations.
  • open-x-embodiment-2023.md - Multi-embodiment robot-learning dataset and RT-X model source.
  • roboturk-2018.md - Crowdsourced 6-DoF teleoperation platform and manipulation demonstration dataset.
  • time-series-library-2024.md - THUML Time-Series-Library benchmark collection used as the LSF/LTSF handle.

194 items under this folder.