Compress & Attend Transformer
Summary
Compress & Attend Transformer (CAT) is a chunk-compressive Transformer architecture. It compresses prior chunks into compact representations and lets the decoder attend to those compressed chunks while autoregressively modeling the current chunk. Adaptive CATs expose chunk size as a test-time quality/efficiency knob.
Interface
- Input: token sequence split into fixed chunks.
- Compression unit: one compressed representation per previous chunk.
- Decoder contract: current chunk tokens attend to current chunk prefix and compressed prior chunks.
- Budget knob: chunk size controls retained-history resolution, memory, and compute.
- Current artifact status: arXiv preprint, rejected ICLR 2026 OpenReview submission, official code, and Hugging Face checkpoint collection.
Role In The Wiki
Use this page as the object card for CAT. The source page carries the evidence details and the OpenReview rejection caveat.
For the foundation time-series agenda, CAT is upstream architecture evidence for controllable context compression. It is useful when comparing compressed history, memory tokens, recurrent state, and KV-cache compression under explicit serving budgets.
Evidence
Official Artifacts
- Preprint: arXiv 2511.05313
- OpenReview: ICLR 2026 rejected submission
- Code: rajesh-lab/cat-transformer
- Hugging Face: CAT transformer collection
- X thread: Jatin Prakash post