Compress & Attend Transformer

Summary

Compress & Attend Transformer (CAT) is a chunk-compressive Transformer architecture. It compresses prior chunks into compact representations and lets the decoder attend to those compressed chunks while autoregressively modeling the current chunk. Adaptive CATs expose chunk size as a test-time quality/efficiency knob.

Interface

  • Input: token sequence split into fixed chunks.
  • Compression unit: one compressed representation per previous chunk.
  • Decoder contract: current chunk tokens attend to current chunk prefix and compressed prior chunks.
  • Budget knob: chunk size controls retained-history resolution, memory, and compute.
  • Current artifact status: arXiv preprint, rejected ICLR 2026 OpenReview submission, official code, and Hugging Face checkpoint collection.

Role In The Wiki

Use this page as the object card for CAT. The source page carries the evidence details and the OpenReview rejection caveat.

For the foundation time-series agenda, CAT is upstream architecture evidence for controllable context compression. It is useful when comparing compressed history, memory tokens, recurrent state, and KV-cache compression under explicit serving budgets.

Evidence

Official Artifacts