SkyJEPA

Source

Raw Markdown: paper_skyjepa-2026.md
PDF: paper_skyjepa-2026.pdf
Preprint: arXiv 2606.23444
Official project page: pratyaksh10.github.io/skyjepa-project-page
Official code: github.com/arplaboratory/SkyJEPA
Official X thread: Pratyaksh Rao announcement
Local X API response: papers/skyjepa-2026/x_post_pratyakshrao5_2069462393244266638.json
Local X snapshot: papers/skyjepa-2026/x_post_pratyakshrao5_2069462393244266638.md
Local official artifact notes: papers/skyjepa-2026/official_artifacts_snapshot.md
Local GitHub README snapshot: papers/skyjepa-2026/github_readme_snapshot.md

Status And Credibility

SkyJEPA is a 2026 arXiv preprint submitted on 2026-06-22 and revised on 2026-06-23; arXiv marks the paper as under review. It is credible enough to track as an important Alex-provided source because it is current, comes from Pratyaksh Rao, Wancong Zhang, Randall Balestriero, Yann LeCun, and Giuseppe Loianno at UC Berkeley, NYU, and Brown, and is backed by an official project page, an official public GitHub repository, and an authenticated author X announcement. It is not yet peer reviewed, and the official GitHub repository currently says training, evaluation, deployment code, pretrained models, dataset instructions, and reproduction scripts will be released soon.

Core Claim

SkyJEPA argues that a useful quadrotor world model should provide accurate long-horizon prediction, interpretable state rollouts, real-time inference for closed-loop control, and zero-shot task generalization. The paper’s concrete claim is that a JEPA-style latent dynamics model, decoded through a physics-inspired prober and used inside an MPPI controller, can reduce compounding rollout error and transfer from domain-randomized simulation to outdoor real-world quadrotor flights without real-world fine-tuning.

Model Interface

The source is unusually explicit about the control interface. The system state includes position, velocity, attitude as an $S O (3)$ rotation matrix, and angular velocity, while the control input is the four motor forces. The latent dynamics model predicts future representations from state and action histories instead of recursively predicting the next physical state directly.

flowchart LR
  Sim["domain-randomized simulator"]
  Data["state/action trajectories"]
  Enc["state + action encoders"]
  Pred["JEPA-style latent dynamics predictor"]
  Prober["physics-inspired prober"]
  MPPI["MPPI candidate control search"]
  Drone["outdoor quadrotor"]

  Sim --> Data --> Enc --> Pred --> Prober --> MPPI --> Drone
  MPPI --> Pred

The prober keeps the learned latent dynamics frozen and learns residual acceleration and angular-acceleration terms on top of a differentiable kinematic integrator. In wiki terminology, it is a bridge from latent state to physically meaningful metric state, not a generic decoder.

Data And Training Notes

Training data is generated entirely in simulation from smooth Gaussian-process reference trajectories tracked by closed-loop controllers.
The paper reports 500 randomized quadrotor domains and 20,000 reference trajectories of 10 seconds each, resampled at 20 Hz and split 80/10/10 for train/validation/test.
Randomization covers mass, inertia, motor time constant, drag coefficients, thrust coefficient, torque coefficient, and arm geometry-related dynamics.
The latent model uses lightweight TCN encoders and a single-layer GRU predictor with a 20-step unroll objective; the control deployment uses TensorRT-optimized inference on an NVIDIA Jetson Orin NX.
The paper introduces a Trajectory Distribution Quality (TDQ) score as a harmonic mean of state-action coverage, transition richness, and simulator-parameter robustness; higher TDQ correlates with lower held-out state RMSE in its ablation.

Evidence And Results

The main evidence is a mix of open-loop prediction, embedded real-time control, and outdoor closed-loop experiments:

Compounding error: by rollout step 60, the direct predictive baseline reaches a compounding ratio of roughly 2.4 while SkyJEPA reaches roughly 1.4; the reported error growth is also lower for SkyJEPA.
Open-loop metric-state recovery: the full method reports 1.43 m mean position RMSE and 4.71° mean attitude error, versus 8.80 m and 53.4° for the direct predictive baseline. The largest improvement comes from adding the physics-inspired prober to the latent dynamics model.
Real-time control: the optimized latent model is small enough for repeated MPPI rollout evaluation near the 10 ms / 100 Hz onboard control budget on Jetson Orin NX.
Zero-shot sim-to-real: in outdoor trajectory tracking, the method reports lower position and attitude errors than both a direct predictive MPPI baseline and a physics-regularized predictive MPPI baseline across circle, oval, figure-8, fish, and lemniscate trajectories.
Platform variation: without retraining, the method reports lower position and attitude errors under propeller switching and 300 g payload transportation scenarios.
Input corruption: the method reports lower pose RMSE under increasingly noisy state histories, with the largest margin at zero or moderate noise and a smaller but still positive margin at high noise.

X Thread Notes

The author announcement frames SkyJEPA through the same four quadrotor-world-model requirements used by the paper: long-horizon prediction, interpretability, real-time inference, and zero-shot task generalization. It also highlights less compounding error, smoother latent trajectories, robustness to corrupted inputs, propeller/payload generalization, and zero-shot sim-to-real transfer. Treat this as official launch narrative; the paper and artifact state remain the evidence sources.

Limitations And Gotchas

The paper is under review and has no venue acceptance recorded locally.
The public GitHub repository currently hosts the overview and visual assets; training, evaluation, deployment code, pretrained models, dataset instructions, and reproduction scripts are not yet released.
The reported system uses low-dimensional quadrotor state and motor-force control inputs. The authors list RGB/RGB-D observations as future work, so this source should not be read as solving high-dimensional visual navigation.
The control evidence is a physical-robotics analogy for this wiki’s time-series agenda, not direct evidence for numeric telemetry, event streams, or digital-world interventions.
The model compares sampled motor-control sequences through MPPI for trajectory tracking, but it does not expose a general calibrated uncertainty or counterfactual-evaluation protocol for arbitrary interventions.
Domain-randomized simulation is central to the result. Transfer quality may depend on whether the simulator randomization covers the deployment shift; the paper’s payload and propeller tests are encouraging but still quadrotor-specific.

Foundation TSFM Relevance

Agenda slot	Verdict	Evidence	Missing pieces
Latent-state prediction	partially closes outside time series	Predicts future latent dynamics from state/action histories and uses a frozen-latent prober to recover physically meaningful state trajectories.	Needs numeric multivariate time-series evidence, latent identifiability tests under non-Gaussian policy-shaped data, and rare-event/state probes.
Control and counterfactuals	partially closes outside time series	Uses explicit control inputs and MPPI candidate-action rollouts for closed-loop real-world control.	Needs typed digital interventions, calibrated uncertainty, counterfactual validation, and operational telemetry benchmarks.
Data diversity and long tail	adjacent	Domain-randomized simulation plus TDQ scoring links state-action coverage, transition richness, parameter diversity, and prediction error.	Needs public release of data generation/reproduction scripts and tests on non-robotic action-conditioned time-series corpora.
Real-time serving	adjacent	TensorRT deployment on Jetson Orin NX keeps repeated latent rollout evaluation near a 100 Hz control budget.	Needs always-on streaming state updates, multi-system serving cost, and telemetry-scale inference evidence.
Benchmark hygiene	warning	Open-loop prediction, closed-loop tracking, platform variation, and noise corruption are reported separately instead of collapsing to one score.	Needs independent reproduction, released code/data, richer distribution shifts, and planning-success versus prediction-error decomposition.

Links Into The Wiki

Open Questions

Does the physics-inspired prober preserve enough hidden state under larger visual, wind, contact, or obstacle distributions, or is it tuned to the low-dimensional state-control regime?
Can the TDQ score predict downstream closed-loop success, not only open-loop state RMSE?
How much of the sim-to-real result comes from JEPA-style latent prediction versus domain-randomized data coverage and the structured prober?
Would adding calibrated uncertainty improve MPPI action ranking under larger platform shifts or partial sensor failures?
Can the same latent-dynamics-plus-structured-prober pattern transfer to non-vision operational time series where the state variables, interventions, and constraints are typed but not governed by rigid-body equations?

Alex Open Research Wiki

Explorer

SkyJEPA: Learning Long-Horizon World Models for Zero-Shot Sim-to-Real Control of Quadrotors