WavSpA: Wavelet Space Attention for Boosting Transformers’ Long Sequence Learning Ability
Source
- Raw Markdown: paper_wavspa-2022.md
- PDF: paper_wavspa-2022.pdf
- Preprint: arXiv 2210.01989
- Official code: EvanZhuang/wavspa
Core Claim
Attention can be performed in a learnable wavelet coefficient space, giving Transformers access to both position and frequency information with linear-time sequence transforms.
Key Contributions
- Proposes Wavelet Space Attention.
- Applies a forward wavelet transform, performs attention in coefficient space, then reconstructs the representation with an inverse transform.
- Compares wavelet-space attention with Fourier-space attention on long-sequence benchmarks.
- Tests fixed and adaptive wavelets.
- Reports improved Long Range Arena performance and better reasoning extrapolation on LEGO-style chain-of-reasoning tasks.
Method Notes
WavSpA is not a time-series forecasting paper, but it is highly relevant to long numeric sequences. Wavelets preserve locality and frequency structure, which are natural for nonstationary time-series signals where Fourier-only global bases can be too coarse.
For TSFMs, this source belongs near attention alternatives, adaptive tokenization, and frequency-aware numeric representation.
Evidence And Results
The abstract reports consistent gains over ordinary Transformer attention and Fourier-space attention on Long Range Arena, plus improved extrapolation over distance in a reasoning task.
Alex Notes
- User-provided official code: EvanZhuang/wavspa.
Limitations
- Long Range Arena and LEGO are not forecasting benchmarks.
- Wavelet attention changes the sequence-mixing substrate but does not by itself solve exogenous variables, channel semantics, or action conditioning.
- Need TSFM-specific tests before treating wavelet attention as a better default for forecasting.
Foundation TSFM Relevance
| Agenda slot | Verdict | Evidence | Missing pieces |
|---|---|---|---|
| Streaming state, long context, and constant updates | adjacent | Wavelet-space attention captures time-frequency locality for long sequences with linear-time transforms. | Not evaluated on streaming TSFM state maintenance or forecasting. |
| Patch size, dynamic tokenization, and point-wise numeric embeddings | adjacent | Learnable wavelet bases offer a frequency/locality-aware alternative to fixed sequence mixing. | Does not choose temporal resolution adaptively or test point-wise numeric fidelity. |
| Time representation and irregular event streams | insufficient evidence | Wavelets preserve position and frequency information for regular sequences. | No irregular sampling, elapsed-time, calendar-time, or event-stream interface. |
Links Into The Wiki
- Time-Series Scaling And Efficiency
- Number Tokenization
- Time-Series Foundation Models
- Foundation Time-Series Model Research Agenda
Open Questions
- Can wavelet-space attention improve long-horizon TSFM stability compared with patching and recurrent state?
- Which wavelet bases are appropriate for irregular, missing, or multivariate signals?
- Is wavelet mixing complementary to learned patching, or does it reduce the need for patching?