What Algorithms can Transformers Learn? A Study in Length Generalization
CMSA Room G10 CMSA, 20 Garden Street, Cambridge, MA, United StatesNew Technologies in Mathematics Seminar Speaker: Preetum Nakkiran, Apple Title: What Algorithms can Transformers Learn? A Study in Length Generalization Abstract: Large language models exhibit many surprising “out-of-distribution” generalization abilities, […]