Perception Encoder

Summary

Perception Encoder is a Meta vision-encoder family trained around scaled contrastive vision-language learning, then specialized through language and spatial alignment.

Role In The Wiki

Perception Encoder is the clearest current source for the claim that a strong visual encoder’s best reusable features may be internal rather than final-layer outputs. It complements Guillotine Regularization: Guillotine names the layer-cutting effect in SSL/projector settings, while Perception Encoder shows the same pattern in a large contrastive vision-language system and uses alignment tuning to expose the hidden features.

Evidence

Official Artifacts

Relation To Foundation TSFM Agenda

Use the source-level agenda mapping in perception-encoder-2025 rather than duplicating verdict rows here.

At the entity level, Perception Encoder is the clearest current source for the claim that a strong visual encoder’s best reusable features may be internal rather than final-layer outputs. It complements Guillotine Regularization: Guillotine names the layer-cutting effect in SSL/projector settings, while Perception Encoder shows the same pattern in a large contrastive vision-language system and uses alignment tuning to expose the hidden features. This page should stay as the object card; source pages carry slot-level verdicts, evidence, and missing pieces.