This event has passed.

What Algorithms can Transformers Learn? A Study in Length Generalization

Name: What Algorithms can Transformers Learn? A Study in Length Generalization
Start: 2024-02-14T14:00:00-05:00
End: 2024-02-14T15:00:00-05:00
Location: CMSA Room G10

February 14, 2024 @ 2:00 pm - 3:00 pm

New Technologies in Mathematics Seminar

Speaker: Preetum Nakkiran, Apple

Title: What Algorithms can Transformers Learn? A Study in Length Generalization

Abstract: Large language models exhibit many surprising “out-of-distribution” generalization abilities, yet also struggle to solve certain simple tasks like decimal addition. To clarify the scope of Transformers’ out-of-distribution generalization, we isolate this behavior in a specific controlled setting: length-generalization on algorithmic tasks. Eg: Can a model trained on 10 digit addition generalize to 50 digit addition? For which tasks do we expect this to work?

Our key tool is the recently-introduced RASP language (Weiss et al 2021), which is a programming language tailor-made for the Transformer’s computational model. We conjecture, informally, that: Transformers tend to length-generalize on a task if there exists a short RASP program that solves the task for all input lengths. This simple conjecture remarkably captures most known instances of length generalization on algorithmic tasks, and can also inform design of effective scratchpads. Finally, on the theoretical side, we give a simple separating example between our conjecture and the “min-degree-interpolator” model of learning from Abbe et al. (2023).

Joint work with Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Josh Susskind, and Samy Bengio. To appear in ICLR 2024.

Details

Date:: February 14, 2024
Time:: 2:00 pm - 3:00 pm
Event Category:: New Technologies in Mathematics Seminar

Organizer

: Michael Douglas

Venue

CMSA Room G10; CMSA, 20 Garden Street
Cambridge, MA 02138 United States + Google Map
Phone:: 6174967132