BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//CMSA - ECPv6.15.20//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:CMSA
X-ORIGINAL-URL:https://cmsa.fas.harvard.edu
X-WR-CALDESC:Events for CMSA
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20220313T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20221106T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20230312T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20231105T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20240310T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20241103T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20250309T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20251102T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20260308T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20261101T060000
END:STANDARD
END:VTIMEZONE
BEGIN:VTIMEZONE
TZID:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20230312T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20231105T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20240310T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20241103T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20250309T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20251102T060000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20250402T140000
DTEND;TZID=America/New_York:20250402T150000
DTSTAMP:20260509T195047
CREATED:20250128T214417Z
LAST-MODIFIED:20250403T144343Z
UID:10003706-1743602400-1743606000@cmsa.fas.harvard.edu
SUMMARY:Learning Dynamical Transport without Data
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Michael Albergo (Harvard) \nTitle: Learning Dynamical Transport without Data \nAbstract: Algorithms based on dynamical transport of measure\, such as score-based diffusion models\, have resulted in great progress in the field of generative modeling. However\, these algorithms rely on access to an abundance of data from the target distribution. A complementary problem to this is learning to generate samples from a target distribution when only given query access to the unnormalized log-likelihood or energy function associated to it\, with myriad application in statistical physics\, chemistry\, and Bayesian inference. I will present an algorithm based on dynamical transport to sample from a target distribution in this context\, which can be seen as an augmentation of annealed importance sampling and sequential Monte Carlo. Time permitting\, I will also discuss how to generalize these ideas to dynamics of discrete distributions. This is joint work with Eric Vanden-Eijnden\, Peter Holderrieth\, and Tommi Jaakkola. \n 
URL:https://cmsa.fas.harvard.edu/event/newtech_4225/
LOCATION:CMSA Room G10\, CMSA\, 20 Garden Street\, Cambridge\, MA\, 02138\, United States
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/CMSA-NTM-Seminar-4.2.2025.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20250327T100000
DTEND;TZID=America/New_York:20250327T110000
DTSTAMP:20260509T195047
CREATED:20250128T214249Z
LAST-MODIFIED:20250327T192309Z
UID:10003666-1743069600-1743073200@cmsa.fas.harvard.edu
SUMMARY:AlphaProof: when reinforcement learning meets formal mathematics
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Thomas Hubert (Google DeepMind) \nTitle: AlphaProof: when reinforcement learning meets formal mathematics \nAbstract: Galileo\, the renowned Italian astronomer\, physicist\, and mathematician\, famously described mathematics as the language of the universe. Progress since only confirmed his intuition as the world we live in can be described with extreme precision with just a few mathematical equations.\nIn the last 70 years\, the rise of computers has also enriched our understanding of and revolutionized the world we live in. Mathematics tremendously benefited from this digital revolution as well: while Gauss had to compute primes by hand\, computers and computation are now routinely used in research mathematics and contribute to grand problems like the Birch and Swinnerton-Dyer conjecture\, one of the Millennium Prize Problems.\nToday\, computers are entering a new age\, one in which computation can be transformed into reasoning. In this talk\, I would like to discuss two such developments that will undoubtedly have an integral role to play in the future of mathematics: the concurrent rise of formal mathematics and of machine intelligence.
URL:https://cmsa.fas.harvard.edu/event/newtech_32625/
LOCATION:Virtual
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/CMSA-NTM-Seminar-3.27.2025.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20250312T140000
DTEND;TZID=America/New_York:20250312T150000
DTSTAMP:20260509T195047
CREATED:20250123T195100Z
LAST-MODIFIED:20250327T194539Z
UID:10003665-1741788000-1741791600@cmsa.fas.harvard.edu
SUMMARY:Discovery in Mathematics with Automated Conjecturing
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Randy Davila\, RelationalAI and Rice University \nTitle: Discovery in Mathematics with Automated Conjecturing \nAbstract: Automated conjecturing is a form of artificial intelligence that applies heuristic-driven methods to mathematical discovery. Since the late 1980s\, systems such as Fajtlowicz’s Graffiti\, DeLaViña’s Graffiti.pc\, and TxGraffiti have collectively contributed to over 130 publications in mathematical journals. In this talk\, we outline the evolution of automated conjecturing\, focusing on TxGraffiti\, a program that employs linear optimization methods and several distinct heuristics to generate mathematically meaningful conjectures. We will then introduce GraphMind\, a dueling framework where the Optimist proposes conjectures while the Pessimist seeks counterexamples\, fostering a feedback loop that strengthens automated reasoning. Finally\, we will present GraffitiAI\, a Python package that extends automated conjecturing across various mathematical domains. \nBio: Randy R. Davila is a Lecturer in the Department of Computational Applied Mathematics & Operations Research at Rice University and a Library Engineer at RelationalAI\, specializing in relational knowledge graph systems for intelligent data management. He earned his PhD in Mathematics from the University of Johannesburg in 2019\, with research focused on graph theory and combinatorial optimization. His work explores artificial intelligence in mathematical conjecture generation\, graph theory\, and neural network applications to combinatorial problems. As the creator of TxGraffiti\, he has developed AI-driven systems that have contributed to numerous mathematical publications. His recent projects include GraphMind\, a dueling agent-based framework that pairs conjecture generation with counterexample discovery\, and GraffitiAI\, a Python package for automated conjecturing across mathematical disciplines. \n 
URL:https://cmsa.fas.harvard.edu/event/newtech_31225/
LOCATION:Hybrid – G10
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/CMSA-NTM-Seminar-3.12.2025.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20250305T140000
DTEND;TZID=America/New_York:20250305T150000
DTSTAMP:20260509T195047
CREATED:20250123T192715Z
LAST-MODIFIED:20250307T154830Z
UID:10003664-1741183200-1741186800@cmsa.fas.harvard.edu
SUMMARY:Machine Learning G2 Geometry
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Elli Heyes\, Imperial College \nTitle: Machine Learning G2 Geometry \nAbstract: Compact Ricci-flat Calabi-Yau and holonomy G2 manifolds appear in string and M-theory respectively as descriptions of the extra spatial dimensions that arise in the theories. Since 2017 machine-learning techniques have been applied extensively to study Calabi-Yau manifolds but until 2024 no similar work had been carried out on holonomy G2 manifolds. In this talk\, I will firstly show how topological properties of these manifolds can be learnt using neural networks. I will then discuss how one could try to numerically learn metrics on compact holonomy G2 manifolds using machine-learning and why these approximations would be useful in M-theory.
URL:https://cmsa.fas.harvard.edu/event/newtech_3525/
LOCATION:Hybrid
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/CMSA-NTM-Seminar-3.5.2025.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20250226T140000
DTEND;TZID=America/New_York:20250226T150000
DTSTAMP:20260509T195047
CREATED:20250124T154400Z
LAST-MODIFIED:20250623T124501Z
UID:10003663-1740578400-1740582000@cmsa.fas.harvard.edu
SUMMARY:Datasets for Math: From AIMO Competitions to Math Copilots for Research
DESCRIPTION:  \nNew Technologies in Mathematics Seminar \nSpeaker: Simon Frieder\, Oxford \nTitle: Datasets for Math: From AIMO Competitions to Math Copilots for Research \nAbstract: This talk begins with a brief exposition of the AI Mathematical Olympiad (AIMO) on Kaggle\, now in its second iteration\, outlining datasets and models available to contestants. Taking a broader perspective\, I then examine 1) the overarching issues the current datasets suffer from—such as binary evaluation or constrained sets of use cases— and 2) the trajectory they set for competition-style mathematical problem-solving\, which is different from mathematical research practice. I argue for a fundamental shift in dataset structure and composition\, both for training and evaluation\, and introduce the idea of mapping mathematical workflows to data\, a key example underscoring the need for this shift. I touch upon new thinking LLMs and their role in redefining LLM math evaluation\, highlighting their implications for dataset design. Finally\, I propose general improvements to the current state of mathematical datasets\, including mathematical adaptations of dataset documentation (e.g.\, datasheets). \n 
URL:https://cmsa.fas.harvard.edu/event/newtech_22625/
LOCATION:Virtual
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/1740494700974-e6086db9-08ab-4681-9ecd-580092fe27b62025-1_1.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20250212T140000
DTEND;TZID=America/New_York:20250212T150000
DTSTAMP:20260509T195047
CREATED:20250123T194306Z
LAST-MODIFIED:20250228T212617Z
UID:10003661-1739368800-1739372400@cmsa.fas.harvard.edu
SUMMARY:Discovering Data Structures: Nearest Neighbor Search and Beyond
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Omar Salemohamed\, Mila \nTitle: Discovering Data Structures: Nearest Neighbor Search and Beyond \nAbstract: As neural networks learn increasingly sophisticated tasks—from image recognition to mastering the game of Go—we ask: can deep learning discover data structures entirely from scratch? We introduce a general framework for data structure discovery\, which adapts to the underlying data distribution and provides fine-grained control over query and space complexity. For nearest neighbor (NN) search\, our model (re)discovers classic algorithms like binary search in one dimension and learns structures reminiscent of k-d trees and locality-sensitive hashing in higher dimensions. Additionally\, the model learns useful representations of high-dimensional data such as images and exploits them to design effective data structures. Beyond NN search\, we believe the framework could be a powerful tool for data structure discovery for other problems and adapt our framework to the problem of estimating frequencies over a data stream. To encourage future work in this direction\, we conclude with a discussion on some of the opportunities and remaining challenges of learning data structures end-to-end.
URL:https://cmsa.fas.harvard.edu/event/newtech_21225/
LOCATION:Virtual
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/CMSA-NTM-Seminar-2.12.2025.docx-1.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20241204T140000
DTEND;TZID=America/New_York:20241204T150000
DTSTAMP:20260509T195047
CREATED:20240907T180227Z
LAST-MODIFIED:20241212T205959Z
UID:10003410-1733320800-1733324400@cmsa.fas.harvard.edu
SUMMARY:Can Transformers Reason Logically? A Study in SAT-Solving
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Leyan Pan\, Georgia Tech \nTitle: Can Transformers Reason Logically? A Study in SAT-Solving \nAbstract: Transformer-based LLMs have apparently demonstrated capabilities that resembles human reasoning. In our recent work\, we investigated the Boolean reasoning abilities of decoder-only Transformers equipped with Chain-of-Thought\, establishing that a Transformer model can decide all 3-SAT instances up to a bounded size (i.e.\, number of variables and clauses). In this talk\, I will first review recent studies that formally examine the expressiveness of Transformer models. Next\, I will explain how we establish an equivalence between Chain-of-Thought reasoning and algorithm\, in our case\, the DPLL SAT-solving algorithm. I will then discuss how to encode 3-SAT formulas and partial assignments as vectors so that the high-level operations in DPLL can be represented as vector operations and implemented using attention mechanisms within Transformers. Finally\, I will present experimental results that support our theoretical predictions. I will also address why standard Transformers can only solve reasoning problems of bounded length\, leading to failures in length-generalization\, and discuss potential solutions to overcome this limitation.
URL:https://cmsa.fas.harvard.edu/event/newtech_12424/
LOCATION:CMSA Room G10\, CMSA\, 20 Garden Street\, Cambridge\, MA\, 02138\, United States
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/CMSA-NTM-Seminar-12.4.24.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20241120T100000
DTEND;TZID=America/New_York:20241120T230000
DTSTAMP:20260509T195047
CREATED:20241017T153402Z
LAST-MODIFIED:20241115T183929Z
UID:10003614-1732096800-1732143600@cmsa.fas.harvard.edu
SUMMARY:Thinking Like Transformers - A Practical Session
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Gail Weiss\, EPFL \nTitle: Thinking Like Transformers – A Practical Session \nAbstract: With the help of the RASP programming language\, we can better imagine how transformers—the powerful attention based sequence processing architecture—solve certain tasks. Some tasks\, such as simply repeating or reversing an input sequence\, have reasonably straightforward solutions\, but many others are more difficult. To unlock a fuller intuition of what can and cannot be achieved with transformers\, we must understand not just the RASP operations but also how to use them effectively.\nIn this session\, I would like to discuss some useful tricks with you in more detail. How is the powerful selector_width operation yielded from the true RASP operations? How can a fixed-depth RASP program perform arbitrary length long-addition\, despite the equally large number of potential carry operations such a computation entails? How might a transformer perform in-context reasoning? And are any of these solutions reasonable\, i.e.\, realisable in practice? I will begin with a brief introduction of the base RASP operations to ground our discussion\, and then walk us through several interesting task solutions. Following this\, and armed with this deeper intuition of how transformers solve several tasks\, we will conclude with a discussion of what this implies for how knowledge and computations must spread out in transformer layers and embeddings in practice.
URL:https://cmsa.fas.harvard.edu/event/newtech_112024/
LOCATION:Virtual
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/CMSA-NTM-Seminar-11.20.24.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20241113T100000
DTEND;TZID=America/New_York:20241113T230000
DTSTAMP:20260509T195047
CREATED:20241017T141250Z
LAST-MODIFIED:20241115T175125Z
UID:10003613-1731492000-1731538800@cmsa.fas.harvard.edu
SUMMARY:Frontier of Formal Theorem Proving with Large Language Models: Insights from the DeepSeek-Prover Series
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Huajian Xin\, DeepSeek \nTitle: Frontier of Formal Theorem Proving with Large Language Models: Insights from the DeepSeek-Prover Series \nAbstract: Recent advances in large language models have markedly influenced mathematical reasoning and automated theorem proving within artificial intelligence. Yet\, despite their success in natural language tasks\, these models face notable obstacles in formal theorem proving environments such as Lean and Isabelle\, where exacting derivations must adhere to strict formal specifications. Even state-of-the-art models encounter difficulty generating accurate and complex formal proofs\, revealing the unique blend of mathematical rigor required in this domain. In the DeepSeek-Prover series (V1 and V1.5)\, we have explored specialized methodologies aimed at addressing these challenges. This talk will delve into three foundational areas: the synthesis of training data through autoformalization\, reinforcement learning that utilizes feedback from proof assistants\, and test-time optimization using Monte Carlo tree search. I will also provide insights into current model capabilities\, persistent challenges\, and the future potential of large language models in automated theorem proving.
URL:https://cmsa.fas.harvard.edu/event/newtech_111324/
LOCATION:Virtual
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/CMSA-NTM-Seminar-11.13.24.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20241106T140000
DTEND;TZID=America/New_York:20241106T150000
DTSTAMP:20260509T195047
CREATED:20241021T164918Z
LAST-MODIFIED:20241108T192620Z
UID:10003617-1730901600-1730905200@cmsa.fas.harvard.edu
SUMMARY:Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Dylan Foster\, Microsoft Research \nTitle: Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning \nAbstract: Imitation learning (IL) aims to mimic the behavior of an expert in a sequential decision making task by learning from demonstrations\, and has been widely applied to robotics\, autonomous driving\, and autoregressive language generation. The simplest approach to IL\, behavior cloning (BC)\, is thought to incur sample complexity with unfavorable quadratic dependence on the problem horizon\, motivating a variety of different online algorithms that attain improved linear horizon dependence under stronger assumptions on the data and the learner’s access to the expert.In this talk\, we revisit the apparent gap between offline and online IL from a learning-theoretic perspective\, with a focus on general policy classes up to and including deep neural networks. Through a new analysis of behavior cloning with the logarithmic loss\, we will show that it is possible to achieve horizon-independent sample complexity in offline IL whenever (i) the range of the cumulative payoffs is controlled\, and (ii) an appropriate notion of supervised learning complexity for the policy class is controlled. When specialized to stationary policies\, this implies that the gap between offline and online IL is smaller than previously thought. We will then discuss implications of this result and investigate the extent to which it bears out empirically. \nBio: Dylan Foster is a principal researcher at Microsoft Research\, New York. Previously\, he was a postdoctoral fellow at MIT\, and received his PhD in computer science from Cornell University\, advised by Karthik Sridharan. His research focuses on problems at the intersection of machine learning\, AI\, interactive decision making. He has received several awards for his work\, including the best paper award at COLT (2019) and best student paper award at COLT (2018\, 2019). \n 
URL:https://cmsa.fas.harvard.edu/event/newtech_11624/
LOCATION:CMSA Room G10\, CMSA\, 20 Garden Street\, Cambridge\, MA\, 02138\, United States
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/CMSA-NTM-Seminar-11.6.24.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20241023T140000
DTEND;TZID=America/New_York:20241023T150000
DTSTAMP:20260509T195047
CREATED:20241021T140701Z
LAST-MODIFIED:20241108T192710Z
UID:10003616-1729692000-1729695600@cmsa.fas.harvard.edu
SUMMARY:How Far Can Transformers Reason? The Globality Barrier and Inductive Scratchpad
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Aryo Lotfi (EPFL) \nTitle: How Far Can Transformers Reason? The Globality Barrier and Inductive Scratchpad \nAbstract: Can Transformers predict new syllogisms by composing established ones? More generally\, what type of targets can be learned by such models from scratch? Recent works show that Transformers can be Turing-complete in terms of expressivity\, but this does not address the learnability objective. This paper puts forward the notion of ‘globality degree’ of a target distribution to capture when weak learning is efficiently achievable by regular Transformers\, where the latter measures the least number of tokens required in addition to the tokens histogram to correlate nontrivially with the target. As shown experimentally and theoretically under additional assumptions\, distributions with high globality cannot be learned efficiently. In particular\, syllogisms cannot be composed on long chains. Furthermore\, we show that (i) an agnostic scratchpad cannot help to break the globality barrier\, (ii) an educated scratchpad can help if it breaks the globality at each step\, however not all such scratchpads can generalize to out-of-distribution (OOD) samples\, (iii) a notion of ‘inductive scratchpad’\, that composes the prior information more efficiently\, can both break the globality barrier and improve the OOD generalization. In particular\, some inductive scratchpads can achieve length generalizations of up to 6x for some arithmetic tasks depending on the input formatting.
URL:https://cmsa.fas.harvard.edu/event/newtech_102324/
LOCATION:CMSA Room G10\, CMSA\, 20 Garden Street\, Cambridge\, MA\, 02138\, United States
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=application/pdf:https://cmsa.fas.harvard.edu/media/CMSA-NTM-Seminar-10.23.24.docx-1-1.pdf
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20241016T140000
DTEND;TZID=America/New_York:20241016T150000
DTSTAMP:20260509T195047
CREATED:20241010T152711Z
LAST-MODIFIED:20241108T192805Z
UID:10003612-1729087200-1729090800@cmsa.fas.harvard.edu
SUMMARY:From Word Prediction to Complex Skills: Data Flywheels for Mathematical Reasoning
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Anirudh Goyal (University of Montreal) \nTitle: From Word Prediction to Complex Skills: Data Flywheels for Mathematical Reasoning \nAbstract: This talk examines how large language models (LLMs) evolve from simple word prediction to complex skills\, with a focus on mathematical problem solving. A major driver of AI products today is the fact that new skills emerge in language models when their parameter set and training corpora are scaled up. This phenomenon is poorly understood\, and a mechanistic explanation via mathematical analysis of gradient-based training seems difficult. The first part of the talk focuses on analysing emergence using the famous (and empirical) Scaling Laws of LLMs. Then I talk about howc LLMs can verbalize these skills by assigning labels to problems and clustering them into interpretable categories. This metacognitive ability allows us to leverage skill-based prompting\, significantly improving performance on mathematical reasoning. I then present a framework that combines LLMs with human oversight to generate challenging\, out-of-distribution math questions. This process led to the creation of the MATH^2 dataset\, which enhances both model and human performance\, driving further advances in mathematical reasoning capabilities. \n 
URL:https://cmsa.fas.harvard.edu/event/newtech_101624/
LOCATION:CMSA Room G10\, CMSA\, 20 Garden Street\, Cambridge\, MA\, 02138\, United States
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/CMSA-NTM-Seminar-10.16.24.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20241002T140000
DTEND;TZID=America/New_York:20241002T150000
DTSTAMP:20260509T195047
CREATED:20240907T180645Z
LAST-MODIFIED:20241002T195652Z
UID:10003453-1727877600-1727881200@cmsa.fas.harvard.edu
SUMMARY:Hierarchical data structures through the lenses of diffusion models
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Antonio Sclocchi\, EPFL \nTitle: Hierarchical data structures through the lenses of diffusion models \nAbstract: The success of deep learning with high-dimensional data relies on the fact that natural data are highly structured. A key aspect of this structure is hierarchical compositionality\, yet quantifying it remains a challenge. \nIn this talk\, we explore how diffusion models can serve as a tool to probe the hierarchical structure of data. We consider a context-free generative model of hierarchical data and show the distinct behaviors of high- and low-level features during a noising-denoising process. Specifically\, we find that high-level features undergo a sharp transition in reconstruction probability at a specific noise level\, while low-level features recombine into new data from different classes. This behavior of latent features leads to correlated changes in real-space variables\, resulting in a diverging correlation length at the transition. \nWe validate these predictions in experiments with real data\, using state-of-the-art diffusion models for both images and texts. Remarkably\, both modalities exhibit a growing correlation length in changing features at the transition of the noising-denoising process. \nOverall\, these results highlight the potential of hierarchical models in capturing non-trivial data structures and offer new theoretical insights for understanding generative AI.
URL:https://cmsa.fas.harvard.edu/event/newtech_10224/
LOCATION:CMSA Room G10\, CMSA\, 20 Garden Street\, Cambridge\, MA\, 02138\, United States
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/CMSA-NTM-Seminar-10.2.24.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20240925T140000
DTEND;TZID=America/New_York:20240925T150000
DTSTAMP:20260509T195047
CREATED:20240907T180716Z
LAST-MODIFIED:20241002T144226Z
UID:10003454-1727272800-1727276400@cmsa.fas.harvard.edu
SUMMARY:Infinite Limits and Scaling Laws for Deep Neural Networks
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Blake Bordelon \nTitle: Infinite Limits and Scaling Laws for Deep Neural Networks \nAbstract: Scaling up the size and training horizon of deep learning models has enabled breakthroughs in computer vision and natural language processing. Empirical evidence suggests that these neural network models are described by regular scaling laws where performance of finite parameter models improves as model size increases\, eventually approaching a limit described by the performance of an infinite parameter model. In this talk\, we will first examine certain infinite parameter limits of deep neural networks which preserve representation learning and then describe how quickly finite models converge to these limits. Using dynamical mean field theory methods\, we provide an asymptotic description of the learning dynamics of randomly initialized infinite width and depth networks. Next\, we will empirically investigate how close the training dynamics of finite networks are to these idealized limits. Lastly\, we will provide a theoretical model of neural scaling laws which describes how generalization depends on three computational resources: training time\, model size and data quantity. This theory allows analysis of compute optimal scaling strategies and predicts how model size and training time should be scaled together in terms of spectral properties of the limiting kernel. The theory also predicts how representation learning can improve neural scaling laws in certain regimes. For very hard tasks\, the theory predicts that representation learning can approximately double the training-time exponent compared to the static kernel limit.
URL:https://cmsa.fas.harvard.edu/event/newtech_92524/
LOCATION:CMSA Room G10\, CMSA\, 20 Garden Street\, Cambridge\, MA\, 02138\, United States
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/CMSA-NTM-Seminar-9.25.24.docx-1.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20240320T140000
DTEND;TZID=America/New_York:20240320T150000
DTSTAMP:20260509T195047
CREATED:20240130T215041Z
LAST-MODIFIED:20240321T140550Z
UID:10001519-1710943200-1710946800@cmsa.fas.harvard.edu
SUMMARY:Solving olympiad geometry without human demonstrations
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Trieu H. Trinh\, Google Deepmind and NYU Dept. of Computer Science \nTitle: Solving olympiad geometry without human demonstrations \nAbstract: Proving mathematical theorems at the olympiad level represents a notable milestone in human-level automated reasoning\, owing to their reputed difficulty among the world’s best talents in pre-university mathematics. Current machine-learning approaches\, however\, are not applicable to most mathematical domains owing to the high cost of translating human proofs into machine-verifiable format. The problem is even worse for geometry because of its unique translation challenges\, resulting in severe scarcity of training data. We propose AlphaGeometry\, a theorem prover for Euclidean plane geometry that sidesteps the need for human demonstrations by synthesizing millions of theorems and proofs across different levels of complexity. AlphaGeometry is a neuro-symbolic system that uses a neural language model\, trained from scratch on our large-scale synthetic data\, to guide a symbolic deduction engine through infinite branching points in challenging problems. On a test set of 30 latest olympiad-level problems\, AlphaGeometry solves 25\, outperforming the previous best method that only solves ten problems and approaching the performance of an average International Mathematical Olympiad (IMO) gold medallist. Notably\, AlphaGeometry produces human-readable proofs\, solves all geometry problems in the IMO 2000 and 2015 under human expert evaluation and discovers a generalized version of a translated IMO theorem in 2004. \n 
URL:https://cmsa.fas.harvard.edu/event/nt-32024/
LOCATION:Virtual
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/CMSA-NTM-Seminar-03.20.2024.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20240306T140000
DTEND;TZID=America/New_York:20240306T150000
DTSTAMP:20260509T195047
CREATED:20240108T153449Z
LAST-MODIFIED:20240306T221235Z
UID:10001129-1709733600-1709737200@cmsa.fas.harvard.edu
SUMMARY:LILO: Learning Interpretable Libraries by Compressing and Documenting Code
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Gabe Grand\, MIT CSAIL and Dept. of EE&CS \nTitle: LILO: Learning Interpretable Libraries by Compressing and Documenting Code \nAbstract: While large language models (LLMs) now excel at code generation\, a key aspect of software development is the art of refactoring: consolidating code into libraries of reusable and readable programs. In this paper\, we introduce LILO\, a neurosymbolic framework that iteratively synthesizes\, compresses\, and documents code to build libraries tailored to particular problem domains. LILO combines LLM-guided program synthesis with recent algorithmic advances in automated refactoring from Stitch: a symbolic compression system that efficiently identifies optimal lambda abstractions across large code corpora. To make these abstractions interpretable\, we introduce an auto-documentation (AutoDoc) procedure that infers natural language names and docstrings based on contextual examples of usage. In addition to improving human readability\, we find that AutoDoc boosts performance by helping LILO’s synthesizer to interpret and deploy learned abstractions. We evaluate LILO on three inductive program synthesis benchmarks for string editing\, scene reasoning\, and graphics composition. Compared to existing neural and symbolic methods – including the state-of-the-art library learning algorithm DreamCoder – LILO solves more complex tasks and learns richer libraries that are grounded in linguistic knowledge.
URL:https://cmsa.fas.harvard.edu/event/nt-3624/
LOCATION:CMSA Room G10\, CMSA\, 20 Garden Street\, Cambridge\, MA\, 02138\, United States
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/CMSA-NTM-Seminar-03.06.2024.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20240221T140000
DTEND;TZID=America/New_York:20240221T150000
DTSTAMP:20260509T195047
CREATED:20240105T034012Z
LAST-MODIFIED:20240223T152643Z
UID:10001113-1708524000-1708527600@cmsa.fas.harvard.edu
SUMMARY:Computers and mathematics in partial differential equations: New developments and challenges
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Javier Gomez Serrano\, Brown University \nTitle: Computers and mathematics in partial differential equations: new developments and challenges \nAbstract: In this talk I will address the interaction between traditional and more modern mathematics and how computers have helped over the last decade providing rigorous (computer-assisted) proofs in the context of partial differential equations. I will also describe new exciting future directions in the field. No background is assumed.
URL:https://cmsa.fas.harvard.edu/event/nt-22124/
LOCATION:CMSA Room G10\, CMSA\, 20 Garden Street\, Cambridge\, MA\, 02138\, United States
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/CMSA-NTM-Seminar-02.21.2024.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20240214T140000
DTEND;TZID=America/New_York:20240214T150000
DTSTAMP:20260509T195047
CREATED:20240102T164110Z
LAST-MODIFIED:20240130T194619Z
UID:10000151-1707919200-1707922800@cmsa.fas.harvard.edu
SUMMARY:What Algorithms can Transformers Learn? A Study in Length Generalization
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Preetum Nakkiran\, Apple \nTitle: What Algorithms can Transformers Learn? A Study in Length Generalization \nAbstract: Large language models exhibit many surprising “out-of-distribution” generalization abilities\, yet also struggle to solve certain simple tasks like decimal addition. To clarify the scope of Transformers’ out-of-distribution generalization\, we isolate this behavior in a specific controlled setting: length-generalization on algorithmic tasks. Eg: Can a model trained on 10 digit addition generalize to 50 digit addition? For which tasks do we expect this to work? \nOur key tool is the recently-introduced RASP language (Weiss et al 2021)\, which is a programming language tailor-made for the Transformer’s computational model. We conjecture\, informally\, that: Transformers tend to length-generalize on a task if there exists a short RASP program that solves the task for all input lengths. This simple conjecture remarkably captures most known instances of length generalization on algorithmic tasks\, and can also inform design of effective scratchpads. Finally\, on the theoretical side\, we give a simple separating example between our conjecture and the “min-degree-interpolator” model of learning from Abbe et al. (2023). \nJoint work with Hattie Zhou\, Arwen Bradley\, Etai Littwin\, Noam Razin\, Omid Saremi\, Josh Susskind\, and Samy Bengio. To appear in ICLR 2024. \n 
URL:https://cmsa.fas.harvard.edu/event/nt21424/
LOCATION:CMSA Room G10\, CMSA\, 20 Garden Street\, Cambridge\, MA\, 02138\, United States
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/CMSA-NTM-Seminar-02.14.2024.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20240207T130000
DTEND;TZID=America/New_York:20240207T140000
DTSTAMP:20260509T195047
CREATED:20240102T163838Z
LAST-MODIFIED:20240207T220617Z
UID:10000149-1707310800-1707314400@cmsa.fas.harvard.edu
SUMMARY:Large language models\, mathematical discovery\, and search in the space of strategies: an anecdote
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Jordan Ellenberg (UW Madison) \nTitle: Large language models\, mathematical discovery\, and search in the space of strategies: an anecdote \nAbstract: I spent a portion of 2023 working with a team at DeepMind on the “cap set problem” – how large can a subset of (Z/3Z)^n be which contains no three terms which sum to zero? (I will explain\, for those not familiar with this problem\, something about the role it plays in combinatorics\, its history\, and why number theorists care about it a lot.) By now\, there are many examples of machine learning mechanisms being used to help generate interesting mathematical knowledge\, and especially interesting examples. This project used a novel protocol; instead of searching directly for large cap sets\, we used LLMs trained on code to search the space of short programs for those which\, when executed\, output large cap sets. One advantage is that a program is much more human-readable than a large collection of vectors over Z/3Z\, bringing us closer to the not-very-well-defined-but-important goal of “interpretable machine learning.” I’ll talk about what succeeded in this project (more than I expected!) what didn’t\, and what role I can imagine this approach to the math-ML interface playing in near-future mathematical practice. \nThe paper: https://www.nature.com/articles/s41586-023-06924-6 \n 
URL:https://cmsa.fas.harvard.edu/event/nt2724/
LOCATION:CMSA Room G10\, CMSA\, 20 Garden Street\, Cambridge\, MA\, 02138\, United States
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/CMSA-NTM-Seminar-02.07.24.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20240124T140000
DTEND;TZID=America/New_York:20240124T150000
DTSTAMP:20260509T195047
CREATED:20240102T163450Z
LAST-MODIFIED:20240125T165049Z
UID:10000148-1706104800-1706108400@cmsa.fas.harvard.edu
SUMMARY:Approaches to the formalization of differential geometry
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Heather Macbeth\, Fordham University \nTitle: Approaches to the formalization of differential geometry \nAbstract: In the last five years\, there has been early work on the computer formalization of differential geometry. I will survey the projects I am aware of. I will also describe two projects of my own\, as case studies for typical challenges. The first (joint with Floris van Doorn) is an exercise in developing suitable abstractions\, the second (joint with Mario Carneiro) is an exercise in developing suitable automation.
URL:https://cmsa.fas.harvard.edu/event/nt-12424/
LOCATION:CMSA Room G10\, CMSA\, 20 Garden Street\, Cambridge\, MA\, 02138\, United States
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/CMSA-NTM-Seminar-01.24.2024.docx-1.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20231115T140000
DTEND;TZID=America/New_York:20231115T150000
DTSTAMP:20260509T195047
CREATED:20240222T094758Z
LAST-MODIFIED:20240222T095355Z
UID:10002797-1700056800-1700060400@cmsa.fas.harvard.edu
SUMMARY:On the Power of Forward pass through Transformer Architectures
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Abhishek Panigrahi\, Dept. of Computer Science\, Princeton University \nTitle: On the Power of Forward pass through Transformer Architectures \nAbstract: Highly trained transformers are capable of interesting computations as they infer for an input. The exact mechanism that these models use during forward passes is an interesting area of study. This talk studies two interesting phenomena. \nIn the first half\, we explore how and why pre-trained language models\, specifically BERT of moderate sizes\, can effectively learn linguistic structures like parse trees during pre-training. Specifically\, using synthetic data through PCFGs\, we show how moderate-sized transformers can perform forward-backward parsing\, also known as the inside-outside algorithm\, during inference. We further understand the role of the pre-training loss for the model to learn to parse during pre-training. \nIn the second half\, we consider in-context learning of large language models\, where they learn to reason on the fly. An ongoing hypothesis is that transformers simulate gradient descent at inference to perform in-context learning. We propose the Transformer in Transformer (TinT) framework\, which creates explicit transformer architectures that can simulate and fine-tune a small pre-trained transformer model during inference. E.g. a 1.3B parameter TINT model can simulate and fine-tune a 125 million parameter model in a single forward pass. This framework suggests that large transformers might execute intricate sub-routines during inference\, and provides insights for enhancing their capabilities through intelligent design considerations. \n 
URL:https://cmsa.fas.harvard.edu/event/nt-111523/
LOCATION:CMSA Room G10\, CMSA\, 20 Garden Street\, Cambridge\, MA\, 02138\, United States
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/NTM-11.15.2023.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20231108T140000
DTEND;TZID=America/New_York:20231108T150000
DTSTAMP:20260509T195047
CREATED:20240222T095919Z
LAST-MODIFIED:20240222T095919Z
UID:10002798-1699452000-1699455600@cmsa.fas.harvard.edu
SUMMARY:Peano: Learning Formal Mathematical Reasoning Without Human Data
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Gabriel Poesia\, Dept. of Computer Science\, Stanford University \nTitle: Peano: Learning Formal Mathematical Reasoning Without Human Data \nAbstract: Peano is a theorem proving environment in which a computational agent can start tabula rasa in a new domain\, learn to solve problems through curiosity-driven exploration\, and create its own higher level actions. Gabriel will describe the system\, present case studies on learning to solve simple algebra problems from the Khan Academy platform\, and describe work on progress on learning the Natural Number Game\, a popular introduction to theorem proving in Lean for mathematicians. \n 
URL:https://cmsa.fas.harvard.edu/event/nt-11823/
LOCATION:CMSA Room G10\, CMSA\, 20 Garden Street\, Cambridge\, MA\, 02138\, United States
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/NTM-11.08.2023.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20231025T140000
DTEND;TZID=America/New_York:20231025T150000
DTSTAMP:20260509T195047
CREATED:20240223T105453Z
LAST-MODIFIED:20240223T105453Z
UID:10002853-1698242400-1698246000@cmsa.fas.harvard.edu
SUMMARY:Llemma: an open language model for mathematics
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Sean Welleck\, CMU\, Language Technologies Institute \nTitle: Llemma: an open language model for mathematics \nAbstract: We present Llemma: 7 billion and 34 billion parameter language models for mathematics. The Llemma models are initialized with Code Llama weights\, then trained on the Proof-Pile II\, a 55 billion token dataset of mathematical web data\, code\, and scientific papers. The resulting models show improved mathematical capabilities\, and can be adapted to various tasks. For instance\, Llemma outperforms the unreleased Minerva model suite on an equi-parameter basis\, and is capable of tool use and formal theorem proving without any further fine-tuning. We openly release all artifacts\, including the Llemma models\, the Proof-Pile II\, and code to replicate our experiments. We hope that Llemma serves as a platform for new research and tools at the intersection of generative models and mathematics. \n  \n 
URL:https://cmsa.fas.harvard.edu/event/nt-102523/
LOCATION:CMSA Room G10\, CMSA\, 20 Garden Street\, Cambridge\, MA\, 02138\, United States
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/NTM-10.25.2023.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20231018T140000
DTEND;TZID=America/New_York:20231018T150000
DTSTAMP:20260509T195047
CREATED:20240223T114049Z
LAST-MODIFIED:20240223T114049Z
UID:10002867-1697637600-1697641200@cmsa.fas.harvard.edu
SUMMARY:Physics of Language Models: Knowledge Storage\, Extraction\, and Manipulation
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Yuanzhi Li\, CMU Dept. of Machine Learning and Microsoft Research \nTitle: Physics of Language Models: Knowledge Storage\, Extraction\, and Manipulation \nAbstract: Large language models (LLMs) can memorize a massive amount of knowledge during pre-training\, but can they effectively use this knowledge at inference time? In this work\, we show several striking results about this question. Using a synthetic biography dataset\, we first show that even if an LLM achieves zero training loss when pretraining on the biography dataset\, it sometimes can not be finetuned to answer questions as simple as “What is the birthday of XXX” at all. We show that sufficient data augmentation during pre-training\, such as rewriting the same biography multiple times or simply using the person’s full name in every sentence\, can mitigate this issue. Using linear probing\, we unravel that such augmentation forces the model to store knowledge about a person in the token embeddings of their name rather than other locations. \nWe then show that LLMs are very bad at manipulating knowledge they learn during pre-training unless a chain of thought is used at inference time. We pretrained an LLM on the synthetic biography dataset\, so that it could answer “What is the birthday of XXX” with 100% accuracy.  Even so\, it could not be further fine-tuned to answer questions like “Is the birthday of XXX even or odd?” directly.  Even using Chain of Thought training data only helps the model answer such questions in a CoT manner\, not directly. \nWe will also discuss preliminary progress on understanding the scaling law of how large a language model needs to be to store X pieces of knowledge and extract them efficiently. For example\, is a 1B parameter language model enough to store all the knowledge of a middle school student? \n  \n 
URL:https://cmsa.fas.harvard.edu/event/nt-101823/
LOCATION:CMSA Room G10\, CMSA\, 20 Garden Street\, Cambridge\, MA\, 02138\, United States
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/NTM-10.18.2023.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20231011T140000
DTEND;TZID=America/New_York:20231011T150000
DTSTAMP:20260509T195047
CREATED:20240223T114336Z
LAST-MODIFIED:20240223T114336Z
UID:10002868-1697032800-1697036400@cmsa.fas.harvard.edu
SUMMARY:LeanDojo: Theorem Proving with Retrieval-Augmented Language Models
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Alex Gu\, MIT Dept. of EE&CS \nTitle: LeanDojo: Theorem Proving with Retrieval-Augmented Language Models \nAbstract: Large language models (LLMs) have shown promise in proving formal theorems using proof assistants such as Lean. However\, existing methods are difficult to reproduce or build on\, due to private code\, data\, and large compute requirements. This has created substantial barriers to research on machine learning methods for theorem proving. We introduce LeanDojo: an open-source Lean playground consisting of toolkits\, data\, models\, and benchmarks. LeanDojo extracts data from Lean and enables interaction with the proof environment programmatically. It contains fine-grained annotations of premises in proofs\, providing valuable data for premise selection: a key bottleneck in theorem proving. Using this data\, we develop ReProver (Retrieval-Augmented Prover): the first LLM-based prover that is augmented with retrieval for selecting premises from a vast math library. It is inexpensive and needs only one GPU week of training. Our retriever leverages LeanDojo’s program analysis capability to identify accessible premises and hard negative examples\, which makes retrieval much more effective. Furthermore\, we construct a new benchmark consisting of 96\,962 theorems and proofs extracted from Lean’s math library. It features a challenging data split requiring the prover to generalize to theorems relying on novel premises that are never used in training. We use this benchmark for training and evaluation\, and experimental results demonstrate the effectiveness of ReProver over non-retrieval baselines and GPT-4. We thus provide the first set of open-source LLM-based theorem provers without any proprietary datasets and release it under a permissive MIT license to facilitate further research. \n 
URL:https://cmsa.fas.harvard.edu/event/nt-101123-2/
LOCATION:CMSA Room G10\, CMSA\, 20 Garden Street\, Cambridge\, MA\, 02138\, United States
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/CMSA-NTM-Seminar-10.11.2023.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20230927T140000
DTEND;TZID=America/New_York:20230927T150000
DTSTAMP:20260509T195047
CREATED:20240227T082824Z
LAST-MODIFIED:20240227T082824Z
UID:10002872-1695823200-1695826800@cmsa.fas.harvard.edu
SUMMARY:Transformers for maths\, and maths for transformers
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: François Charton\, Meta AI \nTitle:  Transformers for maths\, and maths for transformers \nAbstract: Transformers can be trained to solve problems of mathematics. I present two recent applications\, in mathematics and physics: predicting integer sequences\, and discovering the properties of scattering amplitudes in a close relative of Quantum ChromoDynamics. \nProblems of mathematics can also help understand transformers. Using two examples from linear algebra and integer arithmetic\, I show that model predictions can be explained\, that trained models do not confabulate\, and that carefully choosing the training distributions can help achieve better\, and more robust\, performance. \n  \n  \n 
URL:https://cmsa.fas.harvard.edu/event/nt-92723/
LOCATION:CMSA Room G10\, CMSA\, 20 Garden Street\, Cambridge\, MA\, 02138\, United States
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/CMSA-NTM-Seminar-09.27.2023.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20230920T140000
DTEND;TZID=America/New_York:20230920T150000
DTSTAMP:20260509T195047
CREATED:20240227T083355Z
LAST-MODIFIED:20240227T083355Z
UID:10002873-1695218400-1695222000@cmsa.fas.harvard.edu
SUMMARY:The TinyStories Dataset: How Small Can Language Models Be And Still Speak Coherent
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Ronen Eldan\, Microsoft Research \nTitle: The TinyStories Dataset: How Small Can Language Models Be And Still Speak Coherent \nAbstract: While generative language models exhibit powerful capabilities at large scale\, when either the model or the number of training steps is too small\, they struggle to produce coherent and fluent text: Existing models whose size is below a few billion parameters often do not generate coherent text beyond a few sentences. Hypothesizing that one of the main reasons for the strong reliance on size is the vast breadth and abundance of patterns in the datasets used to train those models\, this motivates the following question: Can we design a dataset that preserves the essential elements of natural language\, such as grammar\, vocabulary\, facts\, and reasoning\, but that is much smaller and more refined in terms of its breadth and diversity? \nIn this talk\, we introduce TinyStories\, a synthetic dataset of short stories that only contain words that 3 to 4-year-olds typically understand\, generated by GPT-3.5/4. We show that TinyStories can be used to train and analyze language models that are much smaller than the state-of-the-art models (below 10 million parameters)\, or have much simpler architectures (with only one transformer block)\, yet still produce fluent and consistent stories with several paragraphs that are diverse and have almost perfect grammar\, and demonstrate certain reasoning capabilities. We also show that the trained models are substantially more interpretable than larger ones\, as we can visualize and analyze the attention and activation patterns of the models\, and show how they relate to the generation process and the story content. We hope that TinyStories can facilitate the development\, analysis and research of language models\, especially for low-resource or specialized domains\, and shed light on the emergence of language capabilities in LMs. \n 
URL:https://cmsa.fas.harvard.edu/event/nt-92023/
LOCATION:Virtual
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/CMSA-NTM-Seminar-09.20.2023.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20230510T140000
DTEND;TZID=America/New_York:20230510T150000
DTSTAMP:20260509T195047
CREATED:20230809T105349Z
LAST-MODIFIED:20240228T104953Z
UID:10001225-1683727200-1683730800@cmsa.fas.harvard.edu
SUMMARY:Modern Hopfield Networks for Novel Transformer Architectures
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Dmitry Krotov\, IBM Research – Cambridge \nTitle: Modern Hopfield Networks for Novel Transformer Architectures \nAbstract: Modern Hopfield Networks or Dense Associative Memories are recurrent neural networks with fixed point attractor states that are described by an energy function. In contrast to conventional Hopfield Networks\, which were popular in the 1980s\, their modern versions have a very large memory storage capacity\, which makes them appealing tools for many problems in machine learning and cognitive and neurosciences. In this talk\, I will introduce an intuition and a mathematical formulation of this class of models and will give examples of problems in AI that can be tackled using these new ideas. Particularly\, I will introduce an architecture called Energy Transformer\, which replaces the conventional attention mechanism with a recurrent Dense Associative Memory model. I will explain the theoretical principles behind this architectural choice and show promising empirical results on challenging computer vision and graph network tasks.
URL:https://cmsa.fas.harvard.edu/event/nt-51023/
LOCATION:Virtual
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/CMSA-NTM-Seminar-05.10.23.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20230426T140000
DTEND;TZID=America/New_York:20230426T150000
DTSTAMP:20260509T195047
CREATED:20230809T103350Z
LAST-MODIFIED:20240209T151145Z
UID:10001224-1682517600-1682521200@cmsa.fas.harvard.edu
SUMMARY:Toolformer: Language Models Can Teach Themselves to Use Tools
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Timo Schick\, Meta AI \nTitle: Toolformer: Language Models Can Teach Themselves to Use Tools \nAbstract: Language models exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions\, especially at scale. They also\, paradoxically\, struggle with basic functionality\, such as arithmetic or factual lookup\, where much simpler and smaller models excel. In this talk\, we show how these limitations can be overcome by letting language models teach themselves to use external tools via simple APIs. We discuss Toolformer\, a model trained to independently decide which APIs to call\, when to call them\, what arguments to pass\, and how to best incorporate the results into future token prediction. Through this\, it achieves substantially improved zero-shot performance across a variety of downstream tasks without sacrificing its core language modeling abilities. \n 
URL:https://cmsa.fas.harvard.edu/event/nt-42623/
LOCATION:Virtual
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/CMSA-NTM-Seminar-04.26.23.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20230308T140000
DTEND;TZID=America/New_York:20230308T150000
DTSTAMP:20260509T195047
CREATED:20230808T190051Z
LAST-MODIFIED:20240223T154858Z
UID:10001812-1678284000-1678287600@cmsa.fas.harvard.edu
SUMMARY:How to steer foundation models?
DESCRIPTION:New Technologies in Mathematics Seminar \nSpeaker: Jimmy Ba\, University of Toronto \nTitle: How to steer foundation models? \nAbstract: By conditioning on natural language instructions\, foundation models and large language models (LLMs) have displayed impressive capabilities as general-purpose computers. However\, task performance depends significantly on the quality of the prompt used to steer the model. Due to the lack of knowledge of how foundation models work\, most effective prompts have been handcrafted by humans through a demanding trial-and-error process. To reduce the human effort in this alignment process\, I will discuss a few approaches to steer these powerful models to excel in various downstream language and image tasks. \n 
URL:https://cmsa.fas.harvard.edu/event/nt-3823/
LOCATION:CMSA Room G10\, CMSA\, 20 Garden Street\, Cambridge\, MA\, 02138\, United States
CATEGORIES:New Technologies in Mathematics Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/03.08.2023.png
END:VEVENT
END:VCALENDAR