• Infinite Limits and Scaling Laws for Deep Neural Networks

    CMSA Room G10 CMSA, 20 Garden Street, Cambridge, MA, United States

    https://youtu.be/0998FJhPdj8 New Technologies in Mathematics Seminar Speaker: Blake Bordelon Title: Infinite Limits and Scaling Laws for Deep Neural Networks Abstract: Scaling up the size and training horizon of deep learning models has enabled breakthroughs in computer vision and natural language processing. Empirical evidence suggests that these neural network models are described by regular scaling laws where performance of […]

  • Hierarchical data structures through the lenses of diffusion models

    CMSA Room G10 CMSA, 20 Garden Street, Cambridge, MA, United States

    https://youtu.be/x7LPDDYZn94 New Technologies in Mathematics Seminar Speaker: Antonio Sclocchi, EPFL Title: Hierarchical data structures through the lenses of diffusion models Abstract: The success of deep learning with high-dimensional data relies on the fact that natural data are highly structured. A key aspect of this structure is hierarchical compositionality, yet quantifying it remains a challenge. In […]

  • From Word Prediction to Complex Skills: Data Flywheels for Mathematical Reasoning

    CMSA Room G10 CMSA, 20 Garden Street, Cambridge, MA, United States

    https://youtu.be/OYOuSAAE7QQ New Technologies in Mathematics Seminar Speaker: Anirudh Goyal (University of Montreal) Title: From Word Prediction to Complex Skills: Data Flywheels for Mathematical Reasoning Abstract: This talk examines how large language models (LLMs) evolve from simple word prediction to complex skills, with a focus on mathematical problem solving. A major driver of AI products today is the […]

  • How Far Can Transformers Reason? The Globality Barrier and Inductive Scratchpad

    CMSA Room G10 CMSA, 20 Garden Street, Cambridge, MA, United States

    https://youtu.be/C6NDdnSaluU New Technologies in Mathematics Seminar Speaker: Aryo Lotfi (EPFL) Title: How Far Can Transformers Reason? The Globality Barrier and Inductive Scratchpad Abstract: Can Transformers predict new syllogisms by composing established ones? More generally, what type of targets can be learned by such models from scratch? Recent works show that Transformers can be Turing-complete in terms of […]