Perception Encoder
Summary
Perception Encoder is a Meta vision-encoder family trained around scaled contrastive vision-language learning, then specialized through language and spatial alignment.
Role In The Wiki
Perception Encoder is the clearest current source for the claim that a strong visual encoder’s best reusable features may be internal rather than final-layer outputs. It complements Guillotine Regularization: Guillotine names the layer-cutting effect in SSL/projector settings, while Perception Encoder shows the same pattern in a large contrastive vision-language system and uses alignment tuning to expose the hidden features.
Evidence
Official Artifacts
Relation To Foundation TSFM Agenda
Use the source-level agenda mapping in perception-encoder-2025 rather than duplicating verdict rows here.
At the entity level, Perception Encoder is the clearest current source for the claim that a strong visual encoder’s best reusable features may be internal rather than final-layer outputs. It complements Guillotine Regularization: Guillotine names the layer-cutting effect in SSL/projector settings, while Perception Encoder shows the same pattern in a large contrastive vision-language system and uses alignment tuning to expose the hidden features. This page should stay as the object card; source pages carry slot-level verdicts, evidence, and missing pieces.