During 2022–23, the CMSA will host a seminar on Computers and Math, organized by Michael Douglas. This seminar will take place on Wednesdays from 2:00 pm–3:00 pm (Boston time). The meetings will take place in Room G10 at the CMSA, 20 Garden Street, Cambridge MA 02138, and some meetings will take place virtually on Zoom or be held in hybrid formats. To learn how to attend, please fill out this form, or contact Michael Douglas (mdouglas@cmsa.fas.harvard.edu). The schedule will be updated as talks are confirmed. Seminar videos can be found at the CMSA Youtube site New Technologies in Mathematics Playlist

CMSA COVID-19 Policies

  • April 26, 2023 02:00 PM
Speaker: Timo Schick
Title: Title TBA
Venue: CMSA Room G10

New Technologies in Mathematics Seminar Speaker: Timo Schick, Meta AI  

  • April 05, 2023 02:00 PM
Speaker: Sean Welleck
Title: Title TBA
Venue: CMSA Room G10

New Technologies in Mathematics Seminar Speaker: Sean Welleck  

  • March 08, 2023 02:00 PM
Speaker: Jimmy Ba
Title: How to steer foundation models?
Venue: CMSA Room G10

New Technologies in Mathematics Seminar Speaker: Jimmy Ba, University of Toronto Title: How to steer foundation models? Abstract: By conditioning on natural language instructions, foundation models and large language models (LLMs) have displayed impressive capabilities as general-purpose computers. However, task performance depends significantly on the quality of the prompt used to steer the model. Due to the lack of knowledge of how foundation models work, most effective prompts have been handcrafted by humans through a demanding trial-and-error process. To reduce the human effort in this alignment process, I will discuss a few approaches to steer these powerful models to excel in various downstream language and image tasks.  

  • December 07, 2022 02:00 PM
Speaker: Cyril Zhang
Title: How do Transformers reason? First principles via automata, semigroups, and circuits
Venue: CMSA Room G10

New Technologies in Mathematics Seminar Speaker: Cyril Zhang, Microsoft Research Title: How do Transformers reason? First principles via automata, semigroups, and circuits Abstract: The current “Transformer era” of deep learning is marked by the emergence of combinatorial and algorithmic reasoning capabilities in large sequence models, leading to dramatic advances in natural language understanding, program synthesis, and theorem proving. What is the nature of these models’ internal representations (i.e. how do they represent the states and computational steps of the algorithms they execute)? How can we understand and mitigate their weaknesses, given that they resist interpretation? In this work, we present some insights (and many further mysteries) through the lens of automata and their algebraic structure. Specifically, we investigate the apparent…

  • October 26, 2022 02:00 PM
Speaker: João Araújo, Mathematics Department, Universidade Nova de Lisboa and Michael Kinyon, Department of Mathematics, University of Denver
Title: From Engine to Auto
Venue: CMSA Room G10

New Technologies in Mathematics Seminar Speakers: João Araújo, Mathematics Department, Universidade Nova de Lisboa and Michael Kinyon, Department of Mathematics, University of Denver Title: From Engine to Auto Abstract: Bill McCune produced the program EQP that deals with first order logic formulas and in 1996 managed to solve Robbins’ Conjecture. This very powerful tool reduces to triviality any result that can be obtained by encoding the assumptions and the goals. The next step was to turn the program into a genuine assistant for the working mathematician: find ways to help the prover with proofs; reduce the lengths of the automatic proofs to better crack them;  solve problems in higher order logic; devise tools that autonomously prove results of…

  • October 19, 2022 02:00 PM
Speaker: Antonia Creswell
Title: Towards Faithful Reasoning Using Language Models
Venue: CMSA Room G10

New Technologies in Mathematics Seminar Speaker: Antonia Creswell, DeepMind Title: Towards Faithful Reasoning Using Language Models Abstract: Language models are showing impressive performance on many natural language tasks, including question-answering. However, language models – like most deep learning models – are black boxes. We cannot be sure how they obtain their answers. Do they reason over relevant knowledge to construct an answer or do they rely on prior knowledge – baked into their weights – which may be biased? An alternative approach is to develop models whose output is a human interpretable, faithful reasoning trace leading to an answer. In this talk we will characterise faithful reasoning in terms of logically valid reasoning and demonstrate where current reasoning…

  • October 05, 2022 02:00 PM
Speaker: Guy Gur-Ari
Title: Minerva: Solving Quantitative Reasoning Problems with Language Models
Venue: CMSA Room G10

New Technologies in Mathematics Seminar Speaker: Guy Gur-Ari, Google Research Title: Minerva: Solving Quantitative Reasoning Problems with Language Models Abstract: Quantitative reasoning tasks which can involve mathematics, science, and programming are often challenging for machine learning models in general and for language models in particular. We show that transformer-based language models obtain significantly better performance on math and science questions when trained in an unsupervised way on a large, math-focused dataset. Performance can be further improved using prompting and sampling techniques including chain-of-thought and majority voting. Minerva, a model that combines these techniques, achieves SOTA on several math and science benchmarks. I will describe the model, its capabilities and limitations.

  • September 28, 2022 02:00 PM
Speaker: Surya Ganguli
Title: Statistical mechanics of neural networks: From the geometry of high dimensional error landscapes to beating power law neural scaling
Venue: CMSA Room G10

New Technologies in Mathematics Speaker: Surya Ganguli, Stanford University Title: Statistical mechanics of neural networks: From the geometry of high dimensional error landscapes to beating power law neural scaling Abstract: Statistical mechanics and neural network theory have long enjoyed fruitful interactions.  We will review some of our recent work in this area and then focus on two vignettes. First we will analyze the high dimensional geometry of neural network error landscapes that happen to arise as the classical limit of a dissipative many-body quantum optimizer.  In particular, we will be able to use the Kac-Rice formula and the replica method to calculate the number, location, energy levels, and Hessian eigenspectra of all critical points of any index.  Second we will review…

  • September 14, 2022 02:00 PM
Speaker: Johan Commelin
Title: Breaking the one-mind-barrier in mathematics using formal verification
Venue: CMSA Room G10

New Technologies in Mathematics Seminar Speaker: Johan Commelin, Mathematisches Institut, Albert-Ludwigs-Universität Freiburg Title: Breaking the one-mind-barrier in mathematics using formal verification Abstract: In this talk I will argue that formal verification helps break the one-mind-barrier in mathematics. Indeed, formal verification allows a team of mathematicians to collaborate on a project, without one person understanding all parts of the project. At the same time, it also allows a mathematician to rapidly free mental RAM in order to work on a different component of a project. It thus also expands the one-mind-barrier. I will use the Liquid Tensor Experiment as an example, to illustrate the above two points. This project recently finished the formalization of the main theorem of liquid vector spaces, following up on a challenge by…

  • March 30, 2022 06:15 PM
Speaker: Yuhuai Wu, Stanford and Google
Title: Memorizing Transformers
Venue: Virtual

Abstract: Language models typically need to be trained or fine-tuned in order to acquire new knowledge, which involves updating their weights. We instead envision language models that can simply read and memorize new data at inference time, thus acquiring new knowledge immediately. In this talk, I will discuss how we extend language models with the ability to memorize the internal representations of past inputs. We demonstrate that an approximate NN lookup into a non-differentiable memory of recent (key, value) pairs improves language modeling across various benchmarks and tasks, including generic webtext (C4), math papers (arXiv), books (PG-19), code (Github), as well as formal theorems (Isabelle). We show that the performance steadily improves when we increase the size of memory…

  • March 23, 2022 06:14 PM
Speaker: Stanislas Polu, OpenAI
Title: Formal Mathematics Statement Curriculum Learning
Venue: Virtual

Abstract: We explore the use of expert iteration in the context of language modeling applied to formal mathematics. We show that at same compute budget, expert iteration, by which we mean proof search interleaved with learning, dramatically outperforms proof search only.  We also observe that when applied to a collection of formal statements of sufficiently varied difficulty, expert iteration is capable of finding and solving a curriculum of increasingly difficult problems,  without the need for associated ground-truth proofs. Finally, by applying this expert iteration to a manually curated set of problem statements, we achieve state-of-the-art on the miniF2F benchmark,  automatically solving multiple challenging problems drawn from high school olympiads.

  • March 09, 2022 06:13 PM
Speaker: Iddo Drori, MIT EE&CS and Columbia School of Engineering
Title: Machine Learning STEM Courses in Departments
Venue: Virtual

Abstract: We automatically solve, explain, and generate university-level course problems from thirty STEM courses (at MIT, Harvard, and Columbia) for the first time. We curate a new dataset of course questions and answers across a dozen departments: Aeronautics and Astronautics, Chemical Engineering, Chemistry, Computer Science, Economics, Electrical Engineering, Materials Science, Mathematics, Mechanical Engineering, Nuclear Science, Physics, and Statistics. We generate new questions and use them in a Columbia University course, and perform A/B tests demonstrating that these machine generated questions are indistinguishable from human-written questions and that machine generated explanations are as useful as human-written explanations, again for the first time. Our approach consists of five steps: (i) Given course questions, turn them into programming tasks; (ii) Automatically generate…

  • March 02, 2022 06:11 PM
Speaker: Jared Kaplan, Johns Hopkins Dept. of Physics & Astronomy
Title: Scaling Laws and Their Implications for Coding AI
Venue: Virtual

Abstract:  Scaling laws and associated downstream trends can be used as an organizing principle when thinking about current and future ML progress.  I will briefly review scaling laws for generative models in a number of domains, emphasizing language modeling.  Then I will discuss scaling results for transfer from natural language to code, and results on python programming performance from “codex” and other models.  If there’s time I’ll discuss prospects for the future — limitations from dataset sizes, and prospects for RL and other techniques.

  • February 16, 2022 06:09 PM
Speaker: James Bonifacio, Cambridge DAMTP
Title: Bootstrapping hyperbolic manifolds
Venue: Virtual

Abstract: Hyperbolic manifolds are a class of Riemannian manifolds that are important in mathematics and physics, playing a prominent role in topology, number theory, and string theory. Associated with a given hyperbolic metric is a sequence of numbers corresponding to the discrete eigenvalues of the Laplace-Beltrami operator. While these eigenvalues usually cannot be calculated exactly, they can be found numerically and must also satisfy various bounds. In this talk, I will discuss a new approach for finding numerical bounds on the eigenvalues of closed hyperbolic manifolds using general consistency conditions and semidefinite programming, inspired by the approach of the conformal bootstrap from physics. Although these bootstrap bounds follow from seemingly trivial consistency conditions, they are surprisingly strong and…

  • February 09, 2022 06:07 PM
Speaker: Ben Edelman, Harvard Computer Science
Title: Toward Demystifying Transformers and Attention
Venue: Virtual

Abstract: Over the past several years, attention mechanisms (primarily in the form of the Transformer architecture) have revolutionized deep learning, leading to advances in natural language processing, computer vision, code synthesis, protein structure prediction, and beyond. Attention has a remarkable ability to enable the learning of long-range dependencies in diverse modalities of data. And yet, there is at present limited principled understanding of the reasons for its success. In this talk, I’ll explain how attention mechanisms and Transformers work, and then I’ll share the results of a preliminary investigation into why they work so well. In particular, I’ll discuss an inductive bias of attention that we call sparse variable creation: bounded-norm Transformer layers are capable of representing sparse Boolean functions, with statistical generalization guarantees akin to…

  • February 02, 2022 06:05 PM
Speaker: Michael Bronstein, University of Oxford and Twitter
Title: Neural diffusion PDEs, differential geometry, and graph neural networks
Venue: Virtual

Abstract: In this talk, I will make connections between Graph Neural Networks (GNNs) and non-Euclidean diffusion equations. I will show that drawing on methods from the domain of differential geometry, it is possible to provide a principled view on such GNN architectural choices as positional encoding and graph rewiring as well as explain and remedy the phenomena of oversquashing and bottlenecks.

  • January 26, 2022 06:00 PM
Speaker: Alex Davies, DeepMind
Title: Machine learning with mathematicians
Venue: Virtual

Abstract: Can machine learning be a useful tool for research mathematicians? There are many examples of mathematicians pioneering new technologies to aid our understanding of the mathematical world: using very early computers to help formulate the Birch and Swinnerton-Dyer conjecture and using computer aid to prove the four colour theorem are among the most notable. Up until now there hasn’t been significant use of machine learning in the field and it hasn’t been clear where it might be useful for the questions that mathematicians care about. In this talk we will discuss the results of our recent Nature paper, where we worked together with top mathematicians to use machine learning to achieve two new results – proving a…

  • December 15, 2021 05:35 PM
Speaker: Anurag Anshu, Department of EECS & Challenge Institute for Quantum Computation, UC Berkeley
Title: Unreasonable effectiveness of the quantum complexity view on quantum many-body physics
Venue: Virtual

Abstract: A central challenge in quantum many-body physics is to estimate the properties of natural quantum states, such as the quantum ground states and Gibbs states. Quantum Hamiltonian complexity offers a computational perspective on this challenge and classifies these natural quantum states using the language of quantum complexity classes. This talk will provide a gentle introduction to the field and highlight its success in pinning down the hardness of a wide variety of quantum states. In particular, we will consider the gapped ground states and Gibbs states on low dimensional lattices, which are believed to exhibit ‘low complexity’ due to the widely studied area law behaviour. Here, we will see the crucial role of complexity-theoretic methods in progress on…

  • December 08, 2021 05:32 PM
Speaker: Piotr Nawrot, University of Warsaw
Title: Hierarchical Transformers are More Efficient Language Models
Venue: Virtual

Abstract: Transformer models yield impressive results on many NLP and sequence modeling tasks. Remarkably, Transformers can handle long sequences which allows them to produce long coherent outputs: full paragraphs produced by GPT-3 or well-structured images produced by DALL-E. These large language models are impressive but also very inefficient and costly, which limits their applications and accessibility. We postulate that having an explicit hierarchical architecture is the key to Transformers that efficiently handle long sequences. To verify this claim, we first study different ways to upsample and downsample activations in Transformers so as to make them hierarchical. We use the best performing upsampling and downsampling layers to create Hourglass – a hierarchical Transformer language model. Hourglass improves upon the Transformer…

  • December 01, 2021 05:17 PM
Speaker: Dan Roberts, MIT & Salesforce
Title: The Principles of Deep Learning Theory
Venue: Virtual

Abstract: Deep learning is an exciting approach to modern artificial intelligence based on artificial neural networks. The goal of this talk is to provide a blueprint — using tools from physics — for theoretically analyzing deep neural networks of practical relevance. This task will encompass both understanding the statistics of initialized deep networks and determining the training dynamics of such an ensemble when learning from data. In terms of their “microscopic” definition, deep neural networks are a flexible set of functions built out of many basic computational blocks called neurons, with many neurons in parallel organized into sequential layers. Borrowing from the effective theory framework, we will develop a perturbative 1/n expansion around the limit of an infinite number…