During 2022–23, the CMSA will host a seminar on Computers and Math, organized by Michael Douglas. This seminar will take place on Wednesdays from 2:00 pm–3:00 pm (Boston time). The meetings will take place in Room G10 at the CMSA, 20 Garden Street, Cambridge MA 02138, and some meetings will take place virtually on Zoom or be held in hybrid formats. To learn how to attend, please fill out this form, or contact Michael Douglas (mdouglas@cmsa.fas.harvard.edu). The schedule will be updated as talks are confirmed. Seminar videos can be found at the CMSA Youtube site New Technologies in Mathematics Playlist

CMSA COVID-19 Policies



  • October 05, 2022 02:00 PM
Speaker: Guy Gur-Ari
Title: Minerva: Solving Quantitative Reasoning Problems with Language Models
Venue: CMSA Room G10

New Technologies in Mathematics Seminar Speaker: Guy Gur-Ari, Google Research Title: Minerva: Solving Quantitative Reasoning Problems with Language Models Abstract: Quantitative reasoning tasks which can involve mathematics, science, and programming are often challenging for machine learning models in general and for language models in particular. We show that transformer-based language models obtain significantly better performance on math and science questions when trained in an unsupervised way on a large, math-focused dataset. Performance can be further improved using prompting and sampling techniques including chain-of-thought and majority voting. Minerva, a model that combines these techniques, achieves SOTA on several math and science benchmarks. I will describe the model, its capabilities and limitations.

  • September 28, 2022 02:00 PM
Speaker: Surya Ganguli
Title: Statistical mechanics of neural networks: From the geometry of high dimensional error landscapes to beating power law neural scaling
Venue: CMSA Room G10

New Technologies in Mathematics Speaker: Surya Ganguli, Stanford University Title: Statistical mechanics of neural networks: From the geometry of high dimensional error landscapes to beating power law neural scaling Abstract: Statistical mechanics and neural network theory have long enjoyed fruitful interactions.  We will review some of our recent work in this area and then focus on two vignettes. First we will analyze the high dimensional geometry of neural network error landscapes that happen to arise as the classical limit of a dissipative many-body quantum optimizer.  In particular, we will be able to use the Kac-Rice formula and the replica method to calculate the number, location, energy levels, and Hessian eigenspectra of all critical points of any index.  Second we will review recent…

  • September 14, 2022 02:00 PM
Speaker: Johan Commelin
Title: Breaking the one-mind-barrier in mathematics using formal verification
Venue: CMSA Room G10

New Technologies in Mathematics Seminar Speaker: Johan Commelin, Mathematisches Institut, Albert-Ludwigs-Universität Freiburg Title: Breaking the one-mind-barrier in mathematics using formal verification Abstract: In this talk I will argue that formal verification helps break the one-mind-barrier in mathematics. Indeed, formal verification allows a team of mathematicians to collaborate on a project, without one person understanding all parts of the project. At the same time, it also allows a mathematician to rapidly free mental RAM in order to work on a different component of a project. It thus also expands the one-mind-barrier. I will use the Liquid Tensor Experiment as an example, to illustrate the above two points. This project recently finished the formalization of the main theorem of liquid vector spaces, following up on a challenge by…

  • March 30, 2022 06:15 PM
Speaker: Yuhuai Wu, Stanford and Google
Title: Memorizing Transformers
Venue: Virtual

Abstract: Language models typically need to be trained or fine-tuned in order to acquire new knowledge, which involves updating their weights. We instead envision language models that can simply read and memorize new data at inference time, thus acquiring new knowledge immediately. In this talk, I will discuss how we extend language models with the ability to memorize the internal representations of past inputs. We demonstrate that an approximate NN lookup into a non-differentiable memory of recent (key, value) pairs improves language modeling across various benchmarks and tasks, including generic webtext (C4), math papers (arXiv), books (PG-19), code (Github), as well as formal theorems (Isabelle). We show that the performance steadily improves when we increase the size of memory…

  • March 23, 2022 06:14 PM
Speaker: Stanislas Polu, OpenAI
Title: Formal Mathematics Statement Curriculum Learning
Venue: Virtual

Abstract: We explore the use of expert iteration in the context of language modeling applied to formal mathematics. We show that at same compute budget, expert iteration, by which we mean proof search interleaved with learning, dramatically outperforms proof search only.  We also observe that when applied to a collection of formal statements of sufficiently varied difficulty, expert iteration is capable of finding and solving a curriculum of increasingly difficult problems,  without the need for associated ground-truth proofs. Finally, by applying this expert iteration to a manually curated set of problem statements, we achieve state-of-the-art on the miniF2F benchmark,  automatically solving multiple challenging problems drawn from high school olympiads.

  • March 09, 2022 06:13 PM
Speaker: Iddo Drori, MIT EE&CS and Columbia School of Engineering
Title: Machine Learning STEM Courses in Departments
Venue: Virtual

Abstract: We automatically solve, explain, and generate university-level course problems from thirty STEM courses (at MIT, Harvard, and Columbia) for the first time. We curate a new dataset of course questions and answers across a dozen departments: Aeronautics and Astronautics, Chemical Engineering, Chemistry, Computer Science, Economics, Electrical Engineering, Materials Science, Mathematics, Mechanical Engineering, Nuclear Science, Physics, and Statistics. We generate new questions and use them in a Columbia University course, and perform A/B tests demonstrating that these machine generated questions are indistinguishable from human-written questions and that machine generated explanations are as useful as human-written explanations, again for the first time. Our approach consists of five steps: (i) Given course questions, turn them into programming tasks; (ii) Automatically generate…

  • March 02, 2022 06:11 PM
Speaker: Jared Kaplan, Johns Hopkins Dept. of Physics & Astronomy
Title: Scaling Laws and Their Implications for Coding AI
Venue: Virtual

Abstract:  Scaling laws and associated downstream trends can be used as an organizing principle when thinking about current and future ML progress.  I will briefly review scaling laws for generative models in a number of domains, emphasizing language modeling.  Then I will discuss scaling results for transfer from natural language to code, and results on python programming performance from “codex” and other models.  If there’s time I’ll discuss prospects for the future — limitations from dataset sizes, and prospects for RL and other techniques.

  • February 16, 2022 06:09 PM
Speaker: James Bonifacio, Cambridge DAMTP
Title: Bootstrapping hyperbolic manifolds
Venue: Virtual

Abstract: Hyperbolic manifolds are a class of Riemannian manifolds that are important in mathematics and physics, playing a prominent role in topology, number theory, and string theory. Associated with a given hyperbolic metric is a sequence of numbers corresponding to the discrete eigenvalues of the Laplace-Beltrami operator. While these eigenvalues usually cannot be calculated exactly, they can be found numerically and must also satisfy various bounds. In this talk, I will discuss a new approach for finding numerical bounds on the eigenvalues of closed hyperbolic manifolds using general consistency conditions and semidefinite programming, inspired by the approach of the conformal bootstrap from physics. Although these bootstrap bounds follow from seemingly trivial consistency conditions, they are surprisingly strong and…

  • February 09, 2022 06:07 PM
Speaker: Ben Edelman, Harvard Computer Science
Title: Toward Demystifying Transformers and Attention
Venue: Virtual

Abstract: Over the past several years, attention mechanisms (primarily in the form of the Transformer architecture) have revolutionized deep learning, leading to advances in natural language processing, computer vision, code synthesis, protein structure prediction, and beyond. Attention has a remarkable ability to enable the learning of long-range dependencies in diverse modalities of data. And yet, there is at present limited principled understanding of the reasons for its success. In this talk, I’ll explain how attention mechanisms and Transformers work, and then I’ll share the results of a preliminary investigation into why they work so well. In particular, I’ll discuss an inductive bias of attention that we call sparse variable creation: bounded-norm Transformer layers are capable of representing sparse Boolean functions, with statistical generalization guarantees akin to…

  • February 02, 2022 06:05 PM
Speaker: Michael Bronstein, University of Oxford and Twitter
Title: Neural diffusion PDEs, differential geometry, and graph neural networks
Venue: Virtual

Abstract: In this talk, I will make connections between Graph Neural Networks (GNNs) and non-Euclidean diffusion equations. I will show that drawing on methods from the domain of differential geometry, it is possible to provide a principled view on such GNN architectural choices as positional encoding and graph rewiring as well as explain and remedy the phenomena of oversquashing and bottlenecks.

  • January 26, 2022 06:00 PM
Speaker: Alex Davies, DeepMind
Title: Machine learning with mathematicians
Venue: Virtual

Abstract: Can machine learning be a useful tool for research mathematicians? There are many examples of mathematicians pioneering new technologies to aid our understanding of the mathematical world: using very early computers to help formulate the Birch and Swinnerton-Dyer conjecture and using computer aid to prove the four colour theorem are among the most notable. Up until now there hasn’t been significant use of machine learning in the field and it hasn’t been clear where it might be useful for the questions that mathematicians care about. In this talk we will discuss the results of our recent Nature paper, where we worked together with top mathematicians to use machine learning to achieve two new results – proving a…

  • December 15, 2021 05:35 PM
Speaker: Anurag Anshu, Department of EECS & Challenge Institute for Quantum Computation, UC Berkeley
Title: Unreasonable effectiveness of the quantum complexity view on quantum many-body physics
Venue: Virtual

Abstract: A central challenge in quantum many-body physics is to estimate the properties of natural quantum states, such as the quantum ground states and Gibbs states. Quantum Hamiltonian complexity offers a computational perspective on this challenge and classifies these natural quantum states using the language of quantum complexity classes. This talk will provide a gentle introduction to the field and highlight its success in pinning down the hardness of a wide variety of quantum states. In particular, we will consider the gapped ground states and Gibbs states on low dimensional lattices, which are believed to exhibit ‘low complexity’ due to the widely studied area law behaviour. Here, we will see the crucial role of complexity-theoretic methods in progress on…

  • December 08, 2021 05:32 PM
Speaker: Piotr Nawrot, University of Warsaw
Title: Hierarchical Transformers are More Efficient Language Models
Venue: Virtual

Abstract: Transformer models yield impressive results on many NLP and sequence modeling tasks. Remarkably, Transformers can handle long sequences which allows them to produce long coherent outputs: full paragraphs produced by GPT-3 or well-structured images produced by DALL-E. These large language models are impressive but also very inefficient and costly, which limits their applications and accessibility. We postulate that having an explicit hierarchical architecture is the key to Transformers that efficiently handle long sequences. To verify this claim, we first study different ways to upsample and downsample activations in Transformers so as to make them hierarchical. We use the best performing upsampling and downsampling layers to create Hourglass – a hierarchical Transformer language model. Hourglass improves upon the Transformer…

  • December 01, 2021 05:17 PM
Speaker: Dan Roberts, MIT & Salesforce
Title: The Principles of Deep Learning Theory
Venue: Virtual

Abstract: Deep learning is an exciting approach to modern artificial intelligence based on artificial neural networks. The goal of this talk is to provide a blueprint — using tools from physics — for theoretically analyzing deep neural networks of practical relevance. This task will encompass both understanding the statistics of initialized deep networks and determining the training dynamics of such an ensemble when learning from data. In terms of their “microscopic” definition, deep neural networks are a flexible set of functions built out of many basic computational blocks called neurons, with many neurons in parallel organized into sequential layers. Borrowing from the effective theory framework, we will develop a perturbative 1/n expansion around the limit of an infinite number…

  • November 03, 2021 05:12 PM
Speaker: Curtis Bright, School of Computer Science, University of Windsor and Vijay Ganesh, Dept. of Electrical and Computer Engineering, University of Waterloo
Title: When Computer Algebra Meets Satisfiability: A New Approach to Combinatorial Mathematics
Venue: Virtual

Abstract: Solvers for the Boolean satisfiability (SAT) problem have been increasingly used to resolve problems in mathematics due to their excellent search algorithms.  This talk will describe a new method for mathematical search that couples SAT solvers with computer algebra systems (CAS), thereby combining the expressiveness of CASs with the search power of SAT solvers.  This paradigm has led to a number of results on long-standing mathematical questions such as the first computer-verifiable resolution of Lam’s problem and the discovery of a new infinite class of Williamson matrices.

  • October 27, 2021 05:07 PM
Speaker: Patrick Massot, Laboratoire de Mathématiques d’Orsay and CNRS
Title: Why explain mathematics to computers?
Venue: Virtual

Abstract: A growing number of mathematicians are having fun explaining mathematics to computers using proof assistant softwares. This process is called formalization. In this talk I’ll describe what formalization looks like, what kind of things it teaches us, and how it could even turn out to be useful (in our usual sense of “useful”). This will not be a talk about foundations of mathematics, and I won’t assume any prior knowledge about formalization.

  • October 13, 2021 05:04 PM
Speaker: Marijn Heule, Carnegie Mellon University
Title: Computer-Aided Mathematics and Satisfiability
Venue: Virtual

Abstract: Progress in satisfiability (SAT) solving has made it possible to determine the correctness of complex systems and answer long-standing open questions in mathematics. The SAT solving approach is completely automatic and can produce clever though potentially gigantic proofs. We can have confidence in the correctness of the answers because highly trustworthy systems can validate the underlying proofs regardless of their size. We demonstrate the effectiveness of the SAT approach by presenting some recent successes, including the solution of the Boolean Pythagorean Triples problem, computing the fifth Schur number, and resolving the remaining case of Keller’s conjecture. Moreover, we constructed and validated a proof for each of these results. The second part of the talk focuses on notorious…

  • October 06, 2021 05:03 PM
Speaker: Thomas Fischbacher, Google
Title: New results in Supergravity via ML Technology
Venue: Virtual

Abstract: The infrastructure built to power the Machine Learning revolution has many other uses beyond Deep Learning. Starting from a general architecture-level overview over the lower levels of Google’s TensorFlow machine learning library, we review how this has recently helped us to find all the stable vacua of SO(8) Supergravity in 3+1 dimensions, has allowed major progress on other related questions about M theory, and briefly discuss other applications in field theory and beyond.

  • September 29, 2021 05:02 PM
Speaker: Adam Wagner, Tel Aviv University
Title: Constructions in combinatorics via neural networks
Venue: Virtual

Abstract: Recently, significant progress has been made in the area of machine learning algorithms, and they have quickly become some of the most exciting tools in a scientist’s toolbox. In particular, recent advances in the field of reinforcement learning have led computers to reach superhuman level play in Atari games and Go, purely through self-play. In this talk I will give a very basic introduction to neural networks and reinforcement learning algorithms. I will also indicate how these methods can be adapted to the ““game” of trying to find a counterexample to a mathematical conjecture, and show some examples where this approach was successful.

  • September 15, 2021 04:56 PM
Speaker: Francois Chollet, Google
Title: Why abstraction is the key to intelligence, and what we’re still missing
Venue: Virtual

Abstract: This talk provides a personal perspective on the way forward towards more human-like and more intelligent artificial systems. Traditionally, symbolic and probabilistic methods have dominated the domains of concept formation, abstraction, and automated reasoning. More recently, deep learning-based approaches have led to significant breakthroughs, including successes in games and combinatorial search tasks. However, the resulting systems are still limited in scope and capabilities — they remain brittle, data-hungry, and their generalization capabilities are limited. We will address a set of questions: why is conceptual abstraction essential for intelligence? What is the nature of abstraction, and its relationship to generalization? What kind of abstraction can deep learning models generate, and where do they fail? What are the methods…