WavSpA: Wavelet Space Attention for Boosting Transformers’ Long Sequence Learning Ability

Source

Raw Markdown: paper_wavspa-2022.md
PDF: paper_wavspa-2022.pdf
Preprint: arXiv 2210.01989
Official code: EvanZhuang/wavspa

Core Claim

Attention can be performed in a learnable wavelet coefficient space, giving Transformers access to both position and frequency information with linear-time sequence transforms.

Key Contributions

Proposes Wavelet Space Attention.
Applies a forward wavelet transform, performs attention in coefficient space, then reconstructs the representation with an inverse transform.
Compares wavelet-space attention with Fourier-space attention on long-sequence benchmarks.
Tests fixed and adaptive wavelets.
Reports improved Long Range Arena performance and better reasoning extrapolation on LEGO-style chain-of-reasoning tasks.

Method Notes

WavSpA is not a time-series forecasting paper, but it is highly relevant to long numeric sequences. Wavelets preserve locality and frequency structure, which are natural for nonstationary time-series signals where Fourier-only global bases can be too coarse.

For TSFMs, this source belongs near attention alternatives, adaptive tokenization, and frequency-aware numeric representation.

Evidence And Results

The abstract reports consistent gains over ordinary Transformer attention and Fourier-space attention on Long Range Arena, plus improved extrapolation over distance in a reasoning task.

Alex Notes

User-provided official code: EvanZhuang/wavspa.

Limitations

Long Range Arena and LEGO are not forecasting benchmarks.
Wavelet attention changes the sequence-mixing substrate but does not by itself solve exogenous variables, channel semantics, or action conditioning.
Need TSFM-specific tests before treating wavelet attention as a better default for forecasting.

Foundation TSFM Relevance

Agenda slot	Verdict	Evidence	Missing pieces
Streaming state, long context, and constant updates	adjacent	Wavelet-space attention captures time-frequency locality for long sequences with linear-time transforms.	Not evaluated on streaming TSFM state maintenance or forecasting.
Patch size, dynamic tokenization, and point-wise numeric embeddings	adjacent	Learnable wavelet bases offer a frequency/locality-aware alternative to fixed sequence mixing.	Does not choose temporal resolution adaptively or test point-wise numeric fidelity.
Time representation and irregular event streams	insufficient evidence	Wavelets preserve position and frequency information for regular sequences.	No irregular sampling, elapsed-time, calendar-time, or event-stream interface.

Links Into The Wiki

Open Questions

Can wavelet-space attention improve long-horizon TSFM stability compared with patching and recurrent state?
Which wavelet bases are appropriate for irregular, missing, or multivariate signals?
Is wavelet mixing complementary to learned patching, or does it reduce the need for patching?

Alex Open Research Wiki

Explorer

WavSpA: Wavelet Space Attention for Boosting Transformers' Long Sequence Learning Ability