Scaling Stochastic Momentum from Theory to LLMs
CMSA Room G10 CMSA, 20 Garden Street, Cambridge, MA, United StatesNew Technologies in Mathematics Seminar Speaker: Courtney Paquette, McGill University Title: Scaling Stochastic Momentum from Theory to LLMs Abstract: Given the massive scale of modern ML models, we now often get only a single shot to train them effectively. This limits our ability to sweep architectures and hyperparameters, making it essential to understand how learning algorithms scale […]