What Algorithms can Transformers Learn? A Study in Length Generalization
New Technologies in Mathematics Seminar Speaker: Preetum Nakkiran, Apple Title: What Algorithms can Transformers Learn? A Study in Length Generalization Abstract: Large language models exhibit many surprising “out-of-distribution” generalization abilities, […]