What Algorithms can Transformers Learn? A Study in Length Generalization
CMSA Room G10 CMSA, 20 Garden Street, Cambridge, MA, United StatesNew Technologies in Mathematics Seminar Speaker: Preetum Nakkiran, Apple Title: What Algorithms can Transformers Learn? A Study in Length Generalization Abstract: Large language models exhibit many surprising “out-of-distribution” generalization abilities, yet also struggle to solve certain simple tasks like decimal addition. To clarify the scope of Transformers' out-of-distribution generalization, we isolate this behavior in a […]