Loading Events

« All Events

  • This event has passed.

Asymptotic Theory of Attention: In-Context Learning and Sparse Token Detection

December 1, 2025 @ 4:30 pm - 5:30 pm

Colloquium

Speaker: Yue M. Lu, Harvard University

Title: Asymptotic Theory of Attention: In-Context Learning and Sparse Token Detection

Abstract: Attention-based architectures exhibit striking emergent abilities—from learning tasks directly from context to detecting rare, weak features in long sequences—yet a rigorous theory explaining these behaviors remains limited. In this talk, I will present two recent exactly solvable models that develop a high-dimensional asymptotic theory of attention.

(i) In-context learning. For linear attention pretrained on linear regression tasks, we derive sharp asymptotics in a regime where token dimension, context length, and task diversity all scale proportionally, while the number of pretraining examples scales quadratically. The resulting learning curve exhibits double descent and a phase transition separating a low-diversity memorization regime from a high-diversity regime of genuine in-context generalization. These predictions closely track empirical behavior in both linear-attention models and nonlinear Transformer architectures.

(ii) Sparse-token classification. For detecting weak signals embedded in a small, randomly located subset of tokens, we analyze a single-layer attention classifier and determine its representational and learnability thresholds. Attention succeeds with only logarithmic signal scaling in the sequence length L, outperforming linear baselines that require L scaling. In a proportional high-dimensional regime, we prove that two gradient descent steps yield nontrivial alignment between the query vector and the hidden signal, leading to signal-adaptive attention. Exact formulas for the test error, training loss, and separability capacity quantify this advantage.

Details

  • Date: December 1, 2025
  • Time:
    4:30 pm - 5:30 pm
  • Event Category:

Venue