Source Pages

Curation Fields

Landmark Sources

Read

chemeris-latent-state-time-series-2026.md - Alex’s landmark position source for why observation forecasting is too narrow and why time-series foundation models should optimize for useful internal state.
context-is-key-2024.md - ServiceNow Context is Key benchmark showing that essential natural-language context can be required for accurate time-series forecasts.
lecun-autonomous-machine-intelligence-2022.md - LeCun autonomous machine intelligence proposal centered on world models, intrinsic objectives, and hierarchical JEPA.
pararnn-2025.md - Apple ParaRNN framework for parallel training of nonlinear RNNs at billion-parameter language-model scale.

Skimmed

world-models-2018.md - Ha and Schmidhuber landmark source for VAE + MDN-RNN latent world models, controller training in learned dreams, and simulator-exploitation caveats.

Important Sources

Read

aionoscope-2026.md - MILETS 2026 diagnostic benchmark for latent-state accessibility in frozen time-series representations, with exact categorical and dense labels from controlled Process-to-View generation.
beyond-language-modeling-2026.md - Controlled multimodal pretraining study using Transfusion, visual data, world modeling, and MoE scaling.
bolmo-2025.md - Byteification method for converting subword LMs into competitive byte-level language models.
cauker-2025.md - Synthetic causally coherent time-series generator for TSFM pretraining.
chatts-2024.md - Synthetic-data-trained time-series MLLM for understanding and reasoning over multivariate series.
conceptmoe-2026.md - MoE architecture that merges semantically similar tokens into concept representations.
dinov3-2025.md - Scaled self-supervised vision foundation model with improved dense features.
dragon-hatchling-2025.md - Pathway BDH / Dragon Hatchling source for sparse positive recurrent fast state, synapse-level probes, language/translation scaling, and a cautionary architecture narrative around brain-model and Sudoku claims.
dynamic-fine-tuning-2025.md - Reward-rectified SFT method that links SFT and RL through implicit rewards and token-level gradient scaling.
exprl-2026.md - Exploratory RL mid-training method that uses reference solutions as hidden dense-reward scaffolds instead of imitation targets.
evolution-strategies-at-scale-2025.md - Full-parameter ES fine-tuning of billion-parameter LLMs as an RL alternative.
florence-2-2023.md - Microsoft Florence-2 paper using FLD-5B and an iterative visual data engine to train a compact prompt-based generalist vision model.
gemma-4-12b-2026.md - Google DeepMind production/open-weight release for an encoder-free multimodal 12B model with text, image, and audio inputs.
h-net-2025.md - End-to-end hierarchical byte model with learned dynamic chunking.
guillotine-regularization-2022.md - Layer-cutting analysis showing why SSL projectors can improve training while hiding worse downstream representations at the output.
iclr-time-series-meta-analysis-2026.md - Local ICLR 2026 field-map source for time-series forecasting, representation learning, and physiology-heavy representation clusters.
latent-variable-energy-based-models-2023.md - Lecture-note introduction to latent-variable energy-based models and H-JEPA.
lejepa-2025.md - JEPA theory and SIGReg objective for Gaussian predictive representations.
lenepa-2026.md - MILETS 2026 no-augmentation next-latent prediction recipe for time-series representation learning, using temporal SIGReg instead of stop-gradient or EMA stabilization.
leworldmodel-2026.md - Stable end-to-end JEPA world model from pixels using next-embedding prediction and Gaussian regularization.
temporal-straightening-2026.md - ICML 2026 latent-planning method that regularizes consecutive latent velocities so representation geometry is easier to optimize through with GD and MPC.
mamba-2023.md - Selective state space model architecture for linear-time sequence modeling.
mamba-2-2024.md - Structured state space duality framework and Mamba-2 architecture.
mamba-3-2026.md - Mamba-family architecture adding exponential-trapezoidal discretization, complex state, and MIMO updates.
moda-2026.md - Mixture-of-Depths Attention source for content-based retrieval over prior layer key/value memories and hardware-aware depth attention.
motive-2026.md - ICML 2026 Oral and Outstanding Paper Honorable Mention for query-conditioned motion-gradient attribution and fine-tuning-data curation in video generation.
natural-language-guidance-tts-2024.md - Scalable synthetic annotation method for natural-language-controlled high-fidelity text-to-speech.
nepa-2025.md - Next-embedding predictive autoregression for visual self-supervised learning.
synergy-2025.md - Tokenizer-free byte-level language model with learned abstraction routing.
prism-hypothesis-2025.md - Spectral hypothesis unifying semantic and pixel encoders through frequency structure.
armt-2024.md - Associative Recurrent Memory Transformer source for layerwise associative memory over RMT-style segments.
rate-2023.md - ICLR 2026 RATE source for recurrent memory in offline RL trajectories.
rmt-2022.md - NeurIPS 2022 Recurrent Memory Transformer source for segment-level memory tokens.
timeomni-1-2026.md - Time-series reasoning suite and TimeOmni-1 model for complex temporal reasoning.
timeomni-vl-2026.md - Vision-centric unified model for time-series understanding and generation.
tuna-2-2026.md - Pixel-space unified multimodal model that removes pretrained vision encoders.
u-cast-2025.md - HDTSF formulation, Time-HD benchmark, and U-Cast baseline for high-dimensional multivariate forecasting.
flow-of-ranks-2025.md - Rank-structure analysis and compression recipe for time-series Transformers.
vjepa-2026.md - ICML 2026 probabilistic JEPA source for distributional latent-state prediction, Bayesian-filtering semantics, modular prior conditioning, and the demonstrated-unimodal-versus-multi-modal-futures boundary.
vl-jepa-2025.md - Vision-language JEPA that predicts text embeddings instead of autoregressive tokens.

Skimmed

action100m-2026.md - CVPR 2026 Workshop hierarchical HowTo100M action-annotation dataset with a public 120k-video preview; it extends the VLWM pipeline but is not the unreleased VLWM goal/action/state-change corpus.
levljepa-2026.md - Non-contrastive end-to-end vision-language pretraining method using cross-modal prediction, stop-gradient targets, and per-modality SIGReg, with dense patch-token gains relevant to TSL-JEPA.
sensorfm-2026.md - Google wearable-health foundation model trained on more than one trillion minutes of private sensor data for representation transfer, imputation, and Personal Health Agent grounding.
adajepa-2026.md - Adaptive JEPA latent world model that updates inside the MPC loop using observed transitions before replanning under distribution shift.
agentic-automata-learning-2026.md - Controlled benchmark showing that tool-calling LLM agents still struggle to infer hidden DFA world models through interaction, especially as state complexity and accumulated evidence grow.
aristotelian-representation-hypothesis-2026.md - ICML 2026 calibration critique of raw representational-similarity metrics; global PRH-style convergence largely disappears after width/depth null calibration, while local-neighborhood alignment remains.
exploration-fine-tuning-parameter-decomposition-2026.md - Goodfire hackathon/source-code/X snapshot showing a VPD component-basis edit that removes a small LM’s German prediction ability by tuning one scalar, with LoRA and off-target-language caveats.
flexibility-trap-2026.md - ICML 2026 Outstanding Paper showing that left-to-right RL rollouts improve diffusion-LM solution coverage while preserving parallel decoding at inference.
llm-emu-2026.md - vLLM wall-clock emulator that preserves the online serving stack while replacing GPU forward execution with profile-sampled latency and synthetic output tokens.
minimax-sparse-attention-2026.md - MiniMax block-sparse GQA attention source with trained per-group block selection, 109B MoE evidence, released MSA kernels, and MiniMax-M3 open-weight model context.
llmservingsim-2024.md - IISWC 2024 HW/SW co-simulation infrastructure for LLM inference serving, with iteration-level modeling and computation reuse.
llmservingsim-2-0-2025.md - IEEE CAL 2025 LLMServingSim 2.0 bridge paper on trace-driven heterogeneous-hardware simulation and serving-technique interfaces.
llmservingsim-2-0-2026.md - Current LLMServingSim 2.0 source for heterogeneous and disaggregated LLM serving simulation.
pretraining-recurrent-networks-without-recurrence-2026.md - MIT SMT/DMT source for pretraining nonlinear RNNs from predictive memory labels without standard BPTT.
maya-2025.md - EuroSys 2026 transparent GPU runtime emulation method for deep-learning training workloads.
revati-2026.md - GPU-free time-warp emulator for LLM serving that runs real vLLM/SGLang control logic through CUDA virtualization.
sageserve-2025.md - ACM POMACS 2025 forecast-aware LLM serving autoscaling and routing system using Microsoft O365 production workloads.
vidur-2024.md - Microsoft large-scale LLM inference simulator and Vidur-Search baseline for deployment configuration search.
act-2023.md - Action Chunking with Transformers source for continuous robot action chunks and temporal ensembling.
agentic-world-modeling-2026.md - Survey and taxonomy source for L1 predictors, L2 simulators, L3 evolvers, and physical/digital/social/scientific law regimes.
atlas-2025.md - ATLAS test-time memory module and DeepTransformers family for optimized long-context memorization.
awesome-agentic-time-series-2026.md - June 2026 curated repository and survey map for agentic time-series systems, benchmarks, memory, temporal world models, and reliability.
bridge-2025.md - ICML 2025 text-controlled time-series generation source using multi-agent description refinement and diffusion/prototype conditioning.
catsg-2025.md - Microsoft TimeCraft causal diffusion source for observational, interventional, and counterfactual time-series generation.
diff-mn-2026.md - Continuous time-series generation source for irregular observations with diffusion-parameterized MoE-NCDE dynamics.
diga-2026.md - AAAI 2026 oral financial market generation source using a diffusion-guided meta-agent for controllable order flow.
invdiff-2025.md - KDD 2025 adjacent diffusion source for invariant guidance and unknown-bias mitigation.
mars-2025.md - ICLR 2025 financial market simulation engine powered by order-level generative foundation models.
mg-tsd-2024.md - ICLR 2024 multi-granularity diffusion source for probabilistic time-series forecasting.
mira-2025.md - NeurIPS 2025 Microsoft medical time-series foundation model for irregular clinical forecasting.
oats-2026.md - Online data augmentation method for TSFM pretraining using influence scores and guided diffusion.
otf-lam-2026.md - Factorized observed-transition primitive method for latent action learning under agent ambiguity.
tardiff-2025.md - KDD 2025 target-oriented diffusion source for synthetic EHR time-series generation.
timecraft-2025.md - Microsoft TimeCraft repository and blog source for time-series generation approaches.
timedp-2025.md - AAAI 2025 prototype/domain-prompt diffusion source for cross-domain time-series generation.
timeraf-2025.md - TKDE 2025 retrieval-augmented zero-shot time-series forecasting source.
audio-interaction-model-2026.md - Context-level Audio-Interaction source for caveated silence/response triggering, heavy SoundFlow data construction, FIFO scheduling, and proactive audio intervention.
bitter-lesson-data-filtering-2026.md - Stanford preprint showing that heuristic data filtering can help at low compute but lose to unfiltered Common Crawl at larger model/compute scales.
bittokens-2025.md - IEEE 754 bit-level single-token number encoding for language-model numeracy.
boom-2025.md - Datadog BOOM observability metrics forecasting benchmark.
charm-2025.md - Channel-description-conditioned JEPA embedding model for multivariate time series.
citylearn-2020.md - CityLearn Gymnasium environment and dataset schema for building energy coordination, load shaping, and demand response.
ai-challenge-safe-low-carbon-grid-2025.md - Energy and AI source on the L2RPN 2023 IDF safe low-carbon challenge, 118-node scenarios, winning multimodal action engineering, and alert modules.
active-power-correction-distributed-2021.md - Distributed multi-agent DRL source for active-power correction and adaptability in the L2RPN power-grid control lineage.
adversarial-training-power-systems-2020.md - Adversarial robustness source for power-grid control under line-disconnection contingencies.
conditional-autoencoders-electrical-consumption-2020.md - ECML PKDD source on conditional autoencoders for atypical conditions in electrical-consumption time series.
expert-system-remedial-action-discovery-2018.md - Expert-system source for discovering topological remedial actions in smart grids.
gibbs-priors-topology-control-2026.md - Physics-informed Grid2Op preprint using a GNN post-action overload-risk surrogate and Gibbs prior for topology-control action ranking.
graph-neural-solver-power-systems-2019.md - IJCNN source for graph neural solver methods in power systems.
graph-rl-power-grids-survey-2024.md - Energy and AI/arXiv survey mapping graph RL for power grids, Grid2Op evaluation pitfalls, and learned GNN surrogate planning.
gnn-transmission-grid-topology-2025.md - TenneT/Radboud preprint on heterogeneous graph representations for Grid2Op transmission-grid topology control.
grid-topology-reconfiguration-2020.md - Simple deep-RL baseline source for power-grid topology reconfiguration.
guided-dropout-power-system-2018.md - Guided-dropout source for fast power-system security analysis across topology variants.
guided-power-grid-segmentation-2017.md - Guided machine-learning source for topology-aware power-grid segmentation.
l2rpn-challenge-retrospective-2021.md - PMLR competition-track source describing the NeurIPS 2020 L2RPN challenge and Grid2Op framework.
l2rpn-topology-controllers-2020.md - PSCC/arXiv source framing L2RPN as a topology-controller training challenge.
leap-net-power-grid-perturbations-2019.md - LEAP-net source for learning under power-grid structural perturbations.
leap-nets-system-identification-2020.md - Neurocomputing/HAL metadata source for LEAP nets in system identification and power systems.
line-flow-control-time-series-atcs-2019.md - IEEE/arXiv source for autonomous line-flow control through topology adjustment and early warning.
machine-learning-power-system-operation-2017.md - Historical source introducing machine learning for power-system operation support.
marl2grid-tr-2026.md - ICLR 2026 benchmark source for multi-agent Grid2Op topology and redispatch control with partial observability and safety constraints.
neural-networks-power-flow-graph-neural-solver-2020.md - PSCC/EPSR source for graph neural power-flow solving.
power-grid-alphazero-2022.md - L2RPN WCCI 2022 winning AlphaZero-style topology-optimization source.
power-grid-contingencies-screening-2018.md - Fast neural-net screening source for power-grid contingencies.
power-system-risk-assessment-2018.md - Source on allocating simulator budget for power-system risk assessment.
rl2grid-2025.md - Grid2Op/RTE benchmark preprint for RL in power-grid operation; OpenReview status checked and marked withdrawn.
soft-label-topology-actions-2025.md - ECML PKDD 2025 ADS source on soft-label GNN imitation learning from simulated topology-action outcomes.
targeted-exploration-cascading-failures-2024.md - Climate Change AI/arXiv source using power-flow sensitivity factors for Grid2Op cascading-failure mitigation.
winning-l2rpn-afterstate-actor-critic-2021.md - ICLR 2021 Spotlight source for the WCCI 2020 winning semi-Markov afterstate actor-critic agent.
convergent-evolution-number-representations-2026.md - Number-representation study separating universal Fourier-spectrum spikes from functionally usable modular geometry.
compress-attend-transformer-2025.md - Chunk-compressive Transformer architecture with a test-time quality/efficiency knob and an ICLR 2026 rejection caveat.
comparing-transformers-hybrid-models-2026.md - Ai2 token-level comparison of Olmo 3 and Olmo Hybrid showing hybrid gains on state-conditioned meaning-bearing predictions and transformer strength on exact repetition and structural closure.
compute-optimal-tokenization-2026.md - Meta FAIR / University of Washington scaling-law study arguing that tokenization changes should be compared in bytes per parameter rather than tokens per parameter.
cookbook-self-supervised-learning-2023.md - Beginner-friendly survey and practical taxonomy of SSL methods, recipes, evaluation protocols, and implementation gotchas as of early 2023.
cwm-2025.md - Meta FAIR Code World Model technical report for execution-trace and agentic-code action-observation training.
deep-learning-not-mysterious-different-2025.md - ICML 2025 Spotlight position paper framing benign overfitting, overparametrization, and double descent through soft inductive biases, PAC-Bayes, and compression rather than parameter-count mystery.
diffusion-models-dont-memorize-2025.md - NeurIPS 2025 oral/Best Paper source on implicit dynamical regularization, diffusion generalization windows, and late memorization.
diffusionblocks-2026.md - ICLR 2026 block-wise training framework from Sakana AI that turns residual networks into independently trainable diffusion-style denoising blocks.
dmax-2026.md - DMax diffusion-language-model source for on-policy self-correction and soft parallel decoding, with released code, 16B checkpoints, and math/code trajectory datasets.
illada-2026.md - iLLaDA masked diffusion language-model source, with 8B from-scratch training, 12T pre-training tokens, public weights, and an instruct-gap caveat versus Qwen2.5 Instruct.
thinking-pixel-2026.md - Recursive Sparse Reasoning method for sparse latent-step refinement inside multimodal diffusion attention layers.
gated-deltanet-2025.md - ICLR 2025 NVIDIA linear recurrent attention paper combining scalar gating with the delta rule for bounded associative-memory updates.
gated-deltanet-2-2026.md - NVIDIA preprint on decoupling key-side erase and value-side write gates in linear recurrent attention, with code and matched 1.3B language-model experiments.
hola-2026.md - HOLA preprint adding a fixed top- $w$ exact KV cache to Gated DeltaNet using the delta-rule update magnitude, with matched recency/read ablations and bounded-memory long-context retrieval evidence.
diffusion-policy-2023.md - Robotics source for denoising future continuous action trajectories in a receding-horizon visuomotor policy.
ebt-2025.md - Energy-Based Transformer paper using learned compatibility scores and gradient-based candidate refinement for scalable learning and inference-time thinking.
embedded-language-flows-2026.md - MIT ELF preprint showing continuous embedding-space flow matching for language generation, useful as text-side evidence for multimodal diffusion/flow substrates.
eidos-2026.md - Time-series foundation model family trained through latent-space predictive learning and SiGLU point-wise scalar tokenization.
elt-2026.md - Elastic Looped Transformer source for parameter-efficient visual generation, ILSD loop-boundary supervision, and any-time loop-count inference.
fast-2025.md - Frequency-space action tokenization method for making continuous robot action chunks compatible with autoregressive VLAs.
exploring-large-models-time-series-2024.md - Tsinghua/THUML historical overview of early large time-series models, Timer, AutoTimes, Timer-XL, and OpenLTM.
flowstate-2025.md - SSM-based time-series foundation model with a functional basis decoder for sampling-rate-invariant forecasting.
fone-2025.md - Fourier Number Embedding method for precise single-token number representations.
fprm-2026.md - Fixed-Point Reasoning Model source for pre-norm/residual-scaled looped Transformers with fixed-point halting.
frm-2026.md - Flow Reasoning Model source for self-conditioned discrete-flow refinement, perturb-and-resolve verification, and localized FlowDPO.
generative-recursive-reasoning-2026.md - GRAM source for stochastic multi-trajectory recursive reasoning, width-based inference scaling, and puzzle-focused constraint-satisfaction evidence.
probabilistic-tiny-recursive-model-2026.md - PTRM source for training-free recurrent-noise rollouts, Q-head selection, and the proposal-coverage versus selector-calibration split.
gemini-robotics-1-5-2025.md - Google DeepMind robotics source for embodied reasoning, Motion Transfer, and hierarchical VLA action execution.
genie-2024.md - Google DeepMind ICML 2024 source for learning action-controllable visual world models from unlabeled videos via latent actions.
gr00t-n1-2025.md - NVIDIA humanoid VLA source with a VLM System 2 and DiT/flow-matching System 1 action module.
gqt-2025.md - Graph Quantized Tokenizer source for learned discrete graph vocabularies before Transformer processing.
graph-tokenization-2026.md - ICLR 2026 graph tokenizer using reversible graph serialization plus BPE for standard Transformers.
graphgpt-2025.md - ICML 2025 Graph Eulerian Transformer source for reversible graph-to-sequence pretraining.
graphormer-2021.md - Classic graph Transformer baseline using centrality, shortest-path, and edge attention biases.
io-aware-gnn-layers-2026.md - ICML 2026 Spotlight source and Turbo-GNN code for IO-aware GNN layer kernels, fused graph attention, degree-aware reductions, and cached cuSPARSE baselines.
helix-2025.md - Figure AI technical writeup on a fast/slow humanoid VLA for continuous upper-body control.
helix-02-2026.md - Figure AI follow-on writeup extending Helix to full-body humanoid loco-manipulation with S2/S1/S0 hierarchy.
hierarchical-reasoning-model-2025.md - HRM recurrent fast/slow reasoning architecture for small-model puzzle and ARC-style tasks.
hyperloop-transformers-2026.md - Looped Transformer with loop-level hyper-connections for parameter-efficient language modeling.
hybrid-associative-memories-2026.md - Zyphra preprint on selective KV-cache growth where a recurrent state compresses predictable context and attention stores hard-to-predict tokens.
illusion-of-superposition-2026.md - Latent-CoT interpretability source showing soft-token collapse, fine-tuned shortcutting, and limited from-scratch superposition.
implicit-curriculum-hypothesis-2026.md - Under-review arXiv source proposing that LLM pretraining follows a stable compositional skill-emergence order readable from function-vector geometry.
language-models-need-sleep-2026.md - Sleep-time memory-consolidation method for SSM-attention hybrids that loops before KV-cache eviction.
latent-context-language-models-2026.md - Encoder-decoder soft-token context-compression family for long-context language models and compressed agent memory.
latent-thought-flow-2026.md - LTF source for GFlowNet-trained variable-length continuous latent reasoning trajectories and efficient hidden CoT, with entropy/prior regularization caveats.
looped-world-models-2026.md - LoopWM technical report applying recurrent-depth Transformers to action-conditioned world modeling, with spectral state retention, adaptive early exit, deferred decoding, and reproducibility caveats.
lt2-2026.md - Linear-Time Looped Transformers source replacing full attention inside recurrent-depth loops with linear, sparse, or hybrid mixers, with released code and Ouro-hybrid-1.4B checkpoint.
hidden-uniform-cluster-prior-2022.md - SSL analysis showing that volume-maximization and prototype methods can impose hidden uniform cluster priors that hurt long-tailed data.
jepa-slow-features-2022.md - JEPA failure-mode analysis showing latent predictive objectives can focus on fixed slow distractors instead of action-relevant state.
lejepa-identifiability-2026.md - LeJEPA identifiability theory proving Gaussian-latent state recovery up to rotation under OU-style assumptions, with author X narrative, project page, code, and Lean proof artifacts.
learning-is-forgetting-2026.md - ICLR 2026 Information Bottleneck analysis of LLM training as lossy compression.
llms-noisy-channels-2026.md - ICML 2026 source proposing the Shannon Scaling Law for LLMs as noisy channels, with SNR-aware fits for overtraining, SFT, and quantization degradation.
llms-time-series-analysis-2024.md - Position paper on using LLM interfaces, modality switching, and question answering for time-series analysis.
llms-use-fourier-features-addition-2024.md - Mechanistic analysis of Fourier features in pretrained LLM addition.
mhc-2025.md - DeepSeek-AI constrained Hyper-Connections method for stable matrix-valued residual streams.
neorl2-2025.md - NeoRL-2 near-real-world offline RL benchmark with explicit action-conditioned transition tuples.
nextlat-2026.md - Microsoft Research preprint and code release adding a self-supervised next-hidden-state objective to autoregressive Transformers, with belief-state theory, compact world-model diagnostics, and self-speculative decoding evidence.
no-filter-cultural-socioeconomic-diversity-2024.md - NeurIPS 2024 VLM source showing English-only filtering improves familiar benchmarks while reducing cultural and socioeconomic coverage.
octo-2024.md - Open-source generalist robot policy with Transformer backbone and diffusion action head.
one-layer-enough-2026.md - ICML 2026 paper on layerwise inference dynamics, intermediate decoders, self-repair, and looped single-layer design in tabular foundation models.
openvla-2024.md - Open action-token VLA model for image/language-conditioned robot control.
own-latents-not-tokens-2026.md - Sample-complexity theory arguing that own-latent prediction can recover hidden hierarchy at the local clustering scale on the Random Hierarchy Model.
oryx-2026.md - Google/CMU Multi-Mixer/Oryx preprint on sequence-axis switching between attention and linear recurrent mixers with shared representations.
pi0-2024.md - Physical Intelligence VLA flow model with a semantic VLM backbone and continuous action expert.
pi0-7-2026.md - Steerable generalist VLA model using rich context, metadata, subgoal images, and a flow-matching action expert.
raev2-2026.md - RAEv2 paper and X discussion on multi-layer representation autoencoders, REPA self-guidance, and action-conditioned navigation world-model rollouts.
rdt-1b-2024.md - Robotics Diffusion Transformer source for scaled bimanual continuous action chunk generation.
reconstruction-or-semantics-2026.md - Evaluation of reconstruction and semantic latent spaces for robotic diffusion world models.
reinforcement-learning-small-subnetworks-2025.md - NeurIPS 2025 source showing RL post-training updates a sparse, full-rank, broadly distributed subnetwork while SFT updates more densely.
rlpt-2025.md - Tencent/CUHK work-in-progress preprint on applying RL to pre-training text through next-segment reasoning rewards.
rt-2-2023.md - VLA action-as-language source showing web-scale VLM transfer to robot action tokens.
s4l-2019.md - ICCV 2019 oral source combining self-supervised and semi-supervised image learning, with careful label-scarce ImageNet baselines and Alex-provided Lucas Beyer X context.
scaling-law-time-series-forecasting-2024.md - Theory and experiments for scaling laws in time-series forecasting with look-back horizon as a scaling variable.
scaling-laws-carefully-2026.md - Lilian Weng’s scaling-law synthesis and X announcement, useful as a method-hygiene anchor for compute-optimal allocation, data-limited scaling, and fixed-FLOPs training experiments.
scaling-laws-large-time-series-models-2024.md - Empirical power-law scaling evidence for decoder-only time-series foundation models.
scaling-test-time-compute-agentic-coding-2026.md - Agentic-coding test-time scaling source showing structured rollout summaries outperform raw traces for selection and reuse.
self-teaching-autoencoder-2026.md - Blog/code/demo source for transformed latent-consistency autoencoder training without direct image-space reconstruction loss.
sensorimotor-world-models-2026.md - Inverse-dynamics-regularized JEPA world model that preserves action-relevant controllable state and filters action-irrelevant distractors.
stable-worldmodel-2026.md - Platform source for reproducible JEPA/world-model research, standardized trajectory data handling, MPC solvers, and factor-of-variation evaluation.
skyjepa-2026.md - JEPA-style latent dynamics and physics-inspired prober for real-time zero-shot sim-to-real quadrotor control.
synthetic-data-any-differentiable-target-2026.md - Stanford DPG preprint showing metagradient-optimized synthetic text can steer downstream model weights and differentiable metrics through ordinary SFT, with clean-label poisoning implications.
superhuman-adaptable-intelligence-2026.md - Position paper arguing that AGI terminology should be replaced by specialization plus adaptation speed, with SSL, world models, latent prediction, and modularity as likely substrates.
tabm-2024.md - MLP-based tabular deep-learning model with parameter-efficient ensembling and numerical feature embeddings.
tennessee-eastman-process-2017.md - Rieth et al. Tennessee Eastman Process simulation data for industrial anomaly detection and fault diagnosis.
tiny-recursive-model-2025.md - TRM minimalist recursive reasoning model that simplifies HRM with a single tiny network.
turboquant-2025.md - ICLR 2026 online vector quantization method for KV-cache and vector-search state; vLLM critique narrows production value to memory-pressure cases versus FP8.
universal-weight-subspace-hypothesis-2025.md - Johns Hopkins preprint and code arguing for reusable low-rank weight subspaces across model families, with a tracked mean-adapter baseline caveat.
variable-width-transformers-2026.md - MIT / MIT-IBM preprint introducing the ><former static bowtie layer-width Transformer, with lower language-model loss, fitted FLOPs, and KV-cache width than uniform baselines.
vla-jepa-2026.md - VLA pretraining source using leakage-free JEPA-style latent state prediction, latent-action tokens, and a flow-matching robot action head.
vlwm-2025.md - Meta FAIR language-state world model with System-1 plan decoding, System-2 critic-ranked rollouts, and an unresolved public data/model release gap.
visreg-2026.md - SIGReg-family visual SSL regularizer that decouples variance scale and Sliced-Wasserstein shape matching for JEPA training.
world-model-robot-learning-survey-2026.md - 2026 robot-learning world-model survey separating policy coupling, simulator/evaluator roles, robotic video generation, evaluation, datasets, and open challenges.
training-in-imagination-2026.md - Theory source for training policies in learned world models with separate dynamics/reward errors, sample-budget allocation, and reward-noise versus reward-bias hygiene.
time-hd-2025.md - Time-HD high-dimensional time-series forecasting benchmark introduced with U-Cast.
titans-2025.md - Titans neural long-term memory architecture for learning to memorize context at test time.
tokengt-2022.md - TokenGT source for treating graph nodes and edges as ordinary Transformer tokens.
topological-neural-operators-2026.md - Topological Neural Operators preprint extending neural operators from point/edge functions to cochain-valued fields on cell complexes with DEC-based cross-rank coupling.
toto-2-tsalm-2026.md - TSALM @ ICLR 2026 presentation transcript and slides for Toto 2.0 scaling, training recipe, data mix, ARFBench, Toto-1.0-QA-Experimental, and observability world-model roadmap.
universal-transformers-2018.md - Universal Transformer root source for recurrent-depth self-attention and adaptive per-position halting.
universal-transformers-need-memory-2026.md - Study of memory tokens and ACT depth-state tradeoffs in Universal Transformer recursive reasoning.

Normal Sources

Read

perception-encoder-2025.md - Meta Perception Encoder paper showing strong visual embeddings can be hidden in intermediate layers and exposed through alignment tuning.
reinpatch-2026.md - Reinforcement-trained adaptive patcher for time-series forecasting and zero-shot patch-policy transfer.

Skimmed

atst-2023.md - Audio Teacher-Student Transformer for clip-level and frame-level self-supervised audio representations.
chronos-2-2025.md - Universal forecasting extension of Chronos with grouped time series, covariates, and cross-series in-context learning.
graph-distributed-rl-grid-control-2025.md - Preprint on graph-based distributed RL for Grid2Op using line-level agents, a high-level manager, GNN local observations, and imitation learning.
fade-2026.md - FADE adaptive per-parameter weight-decay method for controlled forgetting in continual learning.
fast-slow-training-2026.md - Fast-Slow Training method for LLM continual adaptation using prompt/context fast weights and parameter slow weights.
kairos-2025.md - Adaptive time-series forecasting model family with benchmarked 10M, 23M, and 50M variants.
mantis-2025.md - Lightweight calibrated foundation model for user-friendly time-series classification.
mantisv2-2026.md - Synthetic-data and test-time-strategy extension of Mantis for zero-shot time-series classification.
moirai-2-2025.md - Smaller Moirai 2.0 forecasting model emphasizing efficiency and calibration.
moirai-2024.md - Universal time-series forecasting Transformer family trained across heterogeneous series.
moirai-moe-2024.md - Sparse mixture-of-experts extension of Moirai for time-series forecasting.
molmo-pixmo-2024.md - Open-weight and open-data VLM family and data engine from Allen AI.
moment-2024.md - Open time-series foundation-model family for forecasting, classification, and representation learning.
moshi-2024.md - Kyutai full-duplex speech-to-speech model with Mimi streaming codec, Inner Monologue text stream, low-latency serving, and temporal artifact metrics.
nutime-2023.md - Numerically multi-scaled embedding method for large-scale time-series pretraining.
pretrained-transformers-universal-computation-engines-2021.md - Frozen language-pretrained Transformers transferred to non-language sequence tasks.
interpretable-policy-distillation-grid-control-2026.md - Preprint distilling a Grid2Op PPO topology-control teacher into auditable tree-based policies.
llm-guided-safe-rl-grid-topology-2026.md - Exploratory preprint on Safety-SAC plus LLM-guided transition refinement for Grid2Op-style topology reconfiguration.
reverso-2026.md - Efficient zero-shot forecasting model centered on compact recurrent-style sequence modeling.
runtime-safety-shielding-power-grid-2026.md - Preprint on hierarchical Grid2Op control with runtime forward-simulation safety shielding.
rwkv-ts-2024.md - RWKV-style recurrent backbone adapted to time-series forecasting and related passive tasks.
simmtm-2023.md - Multi-neighbor masked time-series modeling framework for forecasting and classification pretraining.
stochastic-sharpness-gap-2026.md - SGD edge-of-stability theory source explaining batch-size-dependent sharpness gaps through projected gradient-noise variance.
sundial-2025.md - THUML time-series foundation-model family for forecasting across heterogeneous tasks.
t2s-2025.md - Text-to-time-series generation model using LA-VAE and flow-matching Diffusion Transformer.
t-loss-2019.md - Scalable unsupervised representation learning baseline for multivariate time series.
tabicl-2025.md - Tabular in-context learning model that scales row-wise context beyond small-data TabPFN settings.
tabpfn-3-2026.md - Prior Labs technical report for TabPFN-3, its Thinking/API variants, and TabPFN-TS-3.
tabpfn-v2-2025.md - Tabular prior-data fitted network for fast small-data classification and regression.
telecomts-2025.md - Multimodal 5G observability dataset with scale-preserving KPI time series, anomaly/root-cause labels, and language Q&A fields.
tempopfn-2025.md - Synthetic-pretrained linear RNN prior-data fitted network for zero-shot forecasting.
time-moe-2024.md - Billion-scale mixture-of-experts time-series foundation-model family.
timer-2024.md - Generative pretrained Transformer line framing time-series forecasting as large sequence modeling.
timesfm-2023.md - Decoder-only forecasting foundation model from Google Research.
tiny-time-mixers-2024.md - Compact pretrained MLP-mixer forecasting models for zero-shot and few-shot use.
tirex-2025.md - Zero-shot forecasting model using enhanced in-context learning across short and long horizons.
tivit-2025.md - Time-series classification via frozen vision-model hidden representations.
ts2vec-2021.md - Hierarchical contrastive time-series representation learning with timestamp-level embeddings.
tsmixer-2023.md - All-MLP time-series forecasting architecture that mixes over time and feature dimensions.
toto-2-2026.md - Datadog article announcing the Toto 2.0 open-weights forecasting model family and scaling results.
toto-2025.md - Observability-oriented time-series foundation model from Datadog.
unitime-2023.md - Early language-instruction-conditioned cross-domain time-series forecasting model.
units-2024.md - Unified multi-task time-series model using task tokenization and shared weights.
unishape-2026.md - Shape-aware foundation model for time-series classification.
utica-2026.md - Multi-objective self-distillation pretraining method for time-series classification.
wavspa-2022.md - Wavelet-space attention method for long-sequence Transformers.
varying-grid-topology-2022.md - IEEE ENERGYCON metadata-only source for an early Grid2Op learned GCN line-loading surrogate plus MCTS planning precedent.

Not Read

anomod-2026.md - Multimodal microservice anomaly-detection and root-cause-analysis dataset with logs, metrics, traces, API responses, and code coverage.
chronograph-2025.md - Graph-structured multivariate microservice time-series dataset with temporal node/edge features and incident labels.
huginn-2025.md - Recurrent-depth language model scaling test-time compute through latent reasoning loops.
latent-thoughts-2025.md - Looped Transformer reasoning source connecting repeated depth to latent thoughts.
loopformer-2026.md - Elastic-depth looped Transformer trained for budget-conditioned latent reasoning.
mesanet-2025.md - Mesa layer sequence model using locally optimal test-time training with conjugate-gradient updates.
miras-2025.md - Associative-memory framework for test-time memorization, attentional bias, retention, and online optimization.
parallel-samplers-recurrent-depth-2025.md - Parallel sampler connecting recurrent-depth models to diffusion language models.
parcae-2026.md - Stable looped language-model architecture with scaling-law analysis.
recurrent-transformer-2026.md - Transformer variant with layerwise recurrent memory for greater effective depth and efficient decoding.
sparse-layers-looped-language-models-2026.md - Looped-MoE scaling and early-exit source for looped language models.
titans-revisited-2025.md - Lightweight Titans reimplementation and critical analysis across language, time-series, and recommendation tasks.
universal-reasoning-model-2025.md - UT-derived recursive reasoning model for ARC-AGI and Sudoku-style tasks.
gaia-micross-2021.md - GAIA AIOps dataset collection with MicroSS metrics, traces, logs, and anomaly-injection records.
gift-eval-2024.md - Salesforce GIFT-Eval general time-series forecasting benchmark and leaderboard.
lemma-rca-2024.md - Large multi-modal multi-domain root-cause-analysis dataset collection spanning IT and OT operations.
openrca-2025.md - LLM-agent root-cause-analysis benchmark over natural-language queries, KPI time series, trace graphs, and logs.
ops-lite-2026.md - Compact RCA evaluation set with per-case causal-graph ground truth for microservice systems.
rcaeval-2025.md - Microservice RCA benchmark and evaluation framework with RE1/RE2/RE3 datasets and reproducible baselines.
time-2026.md - TIME contamination-resistant zero-shot forecasting benchmark.
learning-from-leading-indicators-2024.md - LIFT plugin for local lead-lag channel dependence in multivariate forecasting.
t-rep-2023.md - Self-supervised timestep-level time-series representation learning with learned time-embeddings.
time-series-forecasting-manifold-learning-2021.md - Embed-predict-lift manifold-learning approach for high-dimensional time-series forecasting.
evolution-strategies-at-the-hyperscale-2025.md - EGGROLL low-rank perturbation method for hyperscale ES.
evolution-strategies-scalable-alternative-2017.md - OpenAI ES baseline showing scalable black-box policy optimization.
evolutionary-strategies-catastrophic-forgetting-2026.md - Catastrophic-forgetting stress test for ES-based LLM fine-tuning.

Context Sources

Skimmed

world-model-autonomous-power-system-control-2021.md - IEEE SmartGridComm metadata-only context source for WMAP, a non-Grid2Op power-system world-model and safety-shield architecture.

Not Read

yahoo-contextual-bandit-2010.md - Yahoo! news recommendation contextual-bandit logs and evaluation method.
amsterdamumcdb-2021.md - European ICU database with longitudinal observations, medications, fluids, and procedures.
assistments-2009.md - ASSISTments student interaction data with hints, attempts, and tutoring-event sequences.
causalworld-2020.md - Robotic manipulation benchmark for causal structure and transfer learning.
criteo-uplift-2018.md - Marketing treatment/control dataset for uplift and treatment-effect modeling.
d4rl-2020.md - Offline RL benchmark suite of state-action-reward trajectories.
ednet-2019.md - Large-scale hierarchical student activity sequence dataset.
eicu-crd-2018.md - Multi-center ICU database with longitudinal treatments and observations.
heartsteps-2019.md - Mobile-health micro-randomized intervention data for activity suggestions.
hirid-2020.md - High-resolution ICU time-series dataset with treatment/event records.
kdd-cup-2010.md - Student-performance prediction dataset from intelligent tutoring logs.
kuairand-2022.md - Sequential recommendation dataset with randomly exposed videos.
mimic-iv-2023.md - Clinical EHR/ICU database with longitudinal measurements, orders, procedures, and treatments.
ohio-t1dm-2018.md - Type-1 diabetes longitudinal glucose, insulin, meal, and activity dataset.
open-bandit-dataset-2020.md - Logged bandit feedback dataset and pipeline for off-policy evaluation.
pslc-datashop-2010.md - Learning-science repository with student/tutor event logs.
rl-unplugged-2020.md - Offline RL benchmark suite built from logged transitions.
causal-chambers-2024.md - Real physical systems with known causal structure and interventional data.
bridge-data-v2-2023.md - Real-robot manipulation dataset used for language-conditioned policies and robotic world-model evaluation.
droid-2024.md - In-the-wild robot manipulation dataset with synchronized visual observations and language annotations.
open-x-embodiment-2023.md - Multi-embodiment robot-learning dataset and RT-X model source.
roboturk-2018.md - Crowdsourced 6-DoF teleoperation platform and manipulation demonstration dataset.
time-series-library-2024.md - THUML Time-Series-Library benchmark collection used as the LSF/LTSF handle.

Explorer

Source Pages

Source Pages

Curation Fields

Landmark Sources

Read

Skimmed

Important Sources

Read

Skimmed

Normal Sources

Read

Skimmed

Not Read

Context Sources

Skimmed

Not Read