1. Limitation of Self-Attention

\(\text{Attention}(Q, K, V) = \text{softmax}\left( \frac{QK^T}{\sqrt{d_k}} \right)V\).

Complexity: \(\mathcal{O}(n^2)\) → Due to \(QK^T\)

2. Linear Attention

Complexity: \(\mathcal{O}(n^2)\) \(\rightarrow\) \(\mathcal{O}(n)\)

핵심 아이디어

Softmax Attention: \(A = \text{softmax}(QK^T)V\)

Linear Attention: \(A = \phi(Q) \left( \phi(K)^T V \right)\).

	Complexity	Memory
Softmax Attention	\(\mathcal{O}(n^2 d)\)	\(\mathcal{O}(n^2)\)
Linear Attention	\(\mathcal{O}(nd^2)\)	\(\mathcal{O}(nd)\)