- This event has passed.
Asymptotic Theory of Attention: In-Context Learning and Sparse Token Detection

Colloquium
Speaker: Yue M. Lu, Harvard University
Title: Asymptotic Theory of Attention: In-Context Learning and Sparse Token Detection
Abstract: Attention-based architectures exhibit striking emergent abilities—from learning tasks directly from context to detecting rare, weak features in long sequences—yet a rigorous theory explaining these behaviors remains limited. In this talk, I will present two recent exactly solvable models that develop a high-dimensional asymptotic theory of attention.
(i) In-context learning. For linear attention pretrained on linear regression tasks, we derive sharp asymptotics in a regime where token dimension, context length, and task diversity all scale proportionally, while the number of pretraining examples scales quadratically. The resulting learning curve exhibits double descent and a phase transition separating a low-diversity memorization regime from a high-diversity regime of genuine in-context generalization. These predictions closely track empirical behavior in both linear-attention models and nonlinear Transformer architectures.
(ii) Sparse-token classification. For detecting weak signals embedded in a small, randomly located subset of tokens, we analyze a single-layer attention classifier and determine its representational and learnability thresholds. Attention succeeds with only logarithmic signal scaling in the sequence length L, outperforming linear baselines that require √L scaling. In a proportional high-dimensional regime, we prove that two gradient descent steps yield nontrivial alignment between the query vector and the hidden signal, leading to signal-adaptive attention. Exact formulas for the test error, training loss, and separability capacity quantify this advantage.