What Algorithms can Transformers Learn? A Study in Length Generalization
New Technologies in Mathematics Seminar Speaker: Preetum Nakkiran, Apple Title: What Algorithms can Transformers Learn? A Study in Length Generalization Abstract: Large language models exhibit many surprising “out-of-distribution” generalization abilities, yet also struggle to solve certain simple tasks like decimal addition. To clarify the scope of Transformers' out-of-distribution generalization, we isolate this behavior in a […]