BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//CMSA - ECPv6.15.18//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:CMSA
X-ORIGINAL-URL:https://cmsa.fas.harvard.edu
X-WR-CALDESC:Events for CMSA
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20230312T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20231105T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20240310T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20241103T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20250309T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20251102T060000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20241023T140000
DTEND;TZID=America/New_York:20241023T150000
DTSTAMP:20260430T130247
CREATED:20241021T140701Z
LAST-MODIFIED:20241108T192710Z
UID:10003616-1729692000-1729695600@cmsa.fas.harvard.edu
SUMMARY:How Far Can Transformers Reason? The Globality Barrier and Inductive Scratchpad
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Aryo Lotfi (EPFL) \nTitle: How Far Can Transformers Reason? The Globality Barrier and Inductive Scratchpad \nAbstract: Can Transformers predict new syllogisms by composing established ones? More generally\, what type of targets can be learned by such models from scratch? Recent works show that Transformers can be Turing-complete in terms of expressivity\, but this does not address the learnability objective. This paper puts forward the notion of ‘globality degree’ of a target distribution to capture when weak learning is efficiently achievable by regular Transformers\, where the latter measures the least number of tokens required in addition to the tokens histogram to correlate nontrivially with the target. As shown experimentally and theoretically under additional assumptions\, distributions with high globality cannot be learned efficiently. In particular\, syllogisms cannot be composed on long chains. Furthermore\, we show that (i) an agnostic scratchpad cannot help to break the globality barrier\, (ii) an educated scratchpad can help if it breaks the globality at each step\, however not all such scratchpads can generalize to out-of-distribution (OOD) samples\, (iii) a notion of ‘inductive scratchpad’\, that composes the prior information more efficiently\, can both break the globality barrier and improve the OOD generalization. In particular\, some inductive scratchpads can achieve length generalizations of up to 6x for some arithmetic tasks depending on the input formatting.
URL:https://cmsa.fas.harvard.edu/event/newtech_102324/
LOCATION:CMSA Room G10\, CMSA\, 20 Garden Street\, Cambridge\, MA\, 02138\, United States
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=application/pdf:https://cmsa.fas.harvard.edu/media/CMSA-NTM-Seminar-10.23.24.docx-1-1.pdf
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20241016T140000
DTEND;TZID=America/New_York:20241016T150000
DTSTAMP:20260430T130247
CREATED:20241010T152711Z
LAST-MODIFIED:20241108T192805Z
UID:10003612-1729087200-1729090800@cmsa.fas.harvard.edu
SUMMARY:From Word Prediction to Complex Skills: Data Flywheels for Mathematical Reasoning
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Anirudh Goyal (University of Montreal) \nTitle: From Word Prediction to Complex Skills: Data Flywheels for Mathematical Reasoning \nAbstract: This talk examines how large language models (LLMs) evolve from simple word prediction to complex skills\, with a focus on mathematical problem solving. A major driver of AI products today is the fact that new skills emerge in language models when their parameter set and training corpora are scaled up. This phenomenon is poorly understood\, and a mechanistic explanation via mathematical analysis of gradient-based training seems difficult. The first part of the talk focuses on analysing emergence using the famous (and empirical) Scaling Laws of LLMs. Then I talk about howc LLMs can verbalize these skills by assigning labels to problems and clustering them into interpretable categories. This metacognitive ability allows us to leverage skill-based prompting\, significantly improving performance on mathematical reasoning. I then present a framework that combines LLMs with human oversight to generate challenging\, out-of-distribution math questions. This process led to the creation of the MATH^2 dataset\, which enhances both model and human performance\, driving further advances in mathematical reasoning capabilities. \n 
URL:https://cmsa.fas.harvard.edu/event/newtech_101624/
LOCATION:CMSA Room G10\, CMSA\, 20 Garden Street\, Cambridge\, MA\, 02138\, United States
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/CMSA-NTM-Seminar-10.16.24.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20241002T140000
DTEND;TZID=America/New_York:20241002T150000
DTSTAMP:20260430T130247
CREATED:20240907T180645Z
LAST-MODIFIED:20241002T195652Z
UID:10003453-1727877600-1727881200@cmsa.fas.harvard.edu
SUMMARY:Hierarchical data structures through the lenses of diffusion models
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Antonio Sclocchi\, EPFL \nTitle: Hierarchical data structures through the lenses of diffusion models \nAbstract: The success of deep learning with high-dimensional data relies on the fact that natural data are highly structured. A key aspect of this structure is hierarchical compositionality\, yet quantifying it remains a challenge. \nIn this talk\, we explore how diffusion models can serve as a tool to probe the hierarchical structure of data. We consider a context-free generative model of hierarchical data and show the distinct behaviors of high- and low-level features during a noising-denoising process. Specifically\, we find that high-level features undergo a sharp transition in reconstruction probability at a specific noise level\, while low-level features recombine into new data from different classes. This behavior of latent features leads to correlated changes in real-space variables\, resulting in a diverging correlation length at the transition. \nWe validate these predictions in experiments with real data\, using state-of-the-art diffusion models for both images and texts. Remarkably\, both modalities exhibit a growing correlation length in changing features at the transition of the noising-denoising process. \nOverall\, these results highlight the potential of hierarchical models in capturing non-trivial data structures and offer new theoretical insights for understanding generative AI.
URL:https://cmsa.fas.harvard.edu/event/newtech_10224/
LOCATION:CMSA Room G10\, CMSA\, 20 Garden Street\, Cambridge\, MA\, 02138\, United States
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/CMSA-NTM-Seminar-10.2.24.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20240925T140000
DTEND;TZID=America/New_York:20240925T150000
DTSTAMP:20260430T130247
CREATED:20240907T180716Z
LAST-MODIFIED:20241002T144226Z
UID:10003454-1727272800-1727276400@cmsa.fas.harvard.edu
SUMMARY:Infinite Limits and Scaling Laws for Deep Neural Networks
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Blake Bordelon \nTitle: Infinite Limits and Scaling Laws for Deep Neural Networks \nAbstract: Scaling up the size and training horizon of deep learning models has enabled breakthroughs in computer vision and natural language processing. Empirical evidence suggests that these neural network models are described by regular scaling laws where performance of finite parameter models improves as model size increases\, eventually approaching a limit described by the performance of an infinite parameter model. In this talk\, we will first examine certain infinite parameter limits of deep neural networks which preserve representation learning and then describe how quickly finite models converge to these limits. Using dynamical mean field theory methods\, we provide an asymptotic description of the learning dynamics of randomly initialized infinite width and depth networks. Next\, we will empirically investigate how close the training dynamics of finite networks are to these idealized limits. Lastly\, we will provide a theoretical model of neural scaling laws which describes how generalization depends on three computational resources: training time\, model size and data quantity. This theory allows analysis of compute optimal scaling strategies and predicts how model size and training time should be scaled together in terms of spectral properties of the limiting kernel. The theory also predicts how representation learning can improve neural scaling laws in certain regimes. For very hard tasks\, the theory predicts that representation learning can approximately double the training-time exponent compared to the static kernel limit.
URL:https://cmsa.fas.harvard.edu/event/newtech_92524/
LOCATION:CMSA Room G10\, CMSA\, 20 Garden Street\, Cambridge\, MA\, 02138\, United States
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/CMSA-NTM-Seminar-9.25.24.docx-1.png
END:VEVENT
END:VCALENDAR