This event has passed.

Toward Demystifying Transformers and Attention

Name: Toward Demystifying Transformers and Attention
Start: 2022-02-09T14:00:00-05:00
End: 2022-02-09T15:00:00-05:00
Location: Virtual

February 9, 2022 @ 2:00 pm - 3:00 pm

Speaker: Ben Edelman, Harvard Computer Science

Title: Toward Demystifying Transformers and Attention

Abstract: Over the past several years, attention mechanisms (primarily in the form of the Transformer architecture) have revolutionized deep learning, leading to advances in natural language processing, computer vision, code synthesis, protein structure prediction, and beyond. Attention has a remarkable ability to enable the learning of long-range dependencies in diverse modalities of data. And yet, there is at present limited principled understanding of the reasons for its success. In this talk, I’ll explain how attention mechanisms and Transformers work, and then I’ll share the results of a preliminary investigation into why they work so well. In particular, I’ll discuss an inductive bias of attention that we call sparse variable creation: bounded-norm Transformer layers are capable of representing sparse Boolean functions, with statistical generalization guarantees akin to sparse regression.

Details

Date:: February 9, 2022
Time:: 2:00 pm - 3:00 pm
Event Category:: New Technologies in Mathematics Seminar

Venue

Virtual