Compress & Attend Transformer

Summary

Compress & Attend Transformer (CAT) is a chunk-compressive Transformer architecture. It compresses prior chunks into compact representations and lets the decoder attend to those compressed chunks while autoregressively modeling the current chunk. Adaptive CATs expose chunk size as a test-time quality/efficiency knob.

Interface

Input: token sequence split into fixed chunks.
Compression unit: one compressed representation per previous chunk.
Decoder contract: current chunk tokens attend to current chunk prefix and compressed prior chunks.
Budget knob: chunk size controls retained-history resolution, memory, and compute.
Current artifact status: arXiv preprint, rejected ICLR 2026 OpenReview submission, official code, and Hugging Face checkpoint collection.

Role In The Wiki

Use this page as the object card for CAT. The source page carries the evidence details and the OpenReview rejection caveat.

For the foundation time-series agenda, CAT is upstream architecture evidence for controllable context compression. It is useful when comparing compressed history, memory tokens, recurrent state, and KV-cache compression under explicit serving budgets.

Evidence

Attention and Compression is all you need for Controllably Efficient Language Models

Official Artifacts

Preprint: arXiv 2511.05313
OpenReview: ICLR 2026 rejected submission
Code: rajesh-lab/cat-transformer
Hugging Face: CAT transformer collection
X thread: Jatin Prakash post

Alex Open Research Wiki

Explorer

Compress & Attend Transformer

Compress & Attend Transformer

Summary

Interface

Role In The Wiki

Evidence

Official Artifacts

Graph View

Table of Contents

Backlinks

Alex Open Research Wiki

Explorer

Compress & Attend Transformer

Compress & Attend Transformer

Summary

Interface

Role In The Wiki

Evidence

Official Artifacts

Related Pages

Graph View

Table of Contents

Backlinks