New Technologies in Mathematics Seminar Series

During 2021–2022, the CMSA will host a seminar on Computers and Math, organized by Michael Douglas. This seminar will take place on Wednesdays from 2:00–3:00 pm (Boston time). The meetings will take place virtually on Zoom. To learn how to attend, please fill out this form, or contact Michael Douglas (

The schedule below will be updated as talks are confirmed.

Spring 2022

1/26/2022Alex Davies, DeepMindTitle: Machine learning with mathematicians

Abstract: Can machine learning be a useful tool for research mathematicians? There are many examples of mathematicians pioneering new technologies to aid our understanding of the mathematical world: using very early computers to help formulate the Birch and Swinnerton-Dyer conjecture and using computer aid to prove the four colour theorem are among the most notable. Up until now there hasn’t been significant use of machine learning in the field and it hasn’t been clear where it might be useful for the questions that mathematicians care about. In this talk we will discuss the results of our recent Nature paper, where we worked together with top mathematicians to use machine learning to achieve two new results – proving a new connection between the hyperbolic and geometric structure of knots, and conjecturing a resolution to a 50-year problem in representation theory, the combinatorial invariance conjecture. Through these examples we demonstrate a way that machine learning can be used by mathematicians to help guide the development of surprising and beautiful new conjectures.

2/2/2022Michael Bronstein, University of Oxford and TwitterTitle: Neural diffusion PDEs, differential geometry, and graph neural networks

Abstract: In this talk, I will make connections between Graph Neural Networks (GNNs) and non-Euclidean diffusion equations. I will show that drawing on methods from the domain of differential geometry, it is possible to provide a principled view on such GNN architectural choices as positional encoding and graph rewiring as well as explain and remedy the phenomena of oversquashing and bottlenecks. 


2/9/2022Ben Edelman, Harvard Computer ScienceTitle: Toward Demystifying Transformers and Attention

Abstract: Over the past several years, attention mechanisms (primarily in the form of the Transformer architecture) have revolutionized deep learning, leading to advances in natural language processing, computer vision, code synthesis, protein structure prediction, and beyond. Attention has a remarkable ability to enable the learning of long-range dependencies in diverse modalities of data. And yet, there is at present limited principled understanding of the reasons for its success. In this talk, I’ll explain how attention mechanisms and Transformers work, and then I’ll share the results of a preliminary investigation into why they work so well. In particular, I’ll discuss an inductive bias of attention that we call sparse variable creation: bounded-norm Transformer layers are capable of representing sparse Boolean functions, with statistical generalization guarantees akin to sparse regression.

2/16/2022James Bonifacio, Cambridge DAMTPTitle: Bootstrapping hyperbolic manifolds

Abstract: Hyperbolic manifolds are a class of Riemannian manifolds that are important in mathematics and physics, playing a prominent role in topology, number theory, and string theory. Associated with a given hyperbolic metric is a sequence of numbers corresponding to the discrete eigenvalues of the Laplace-Beltrami operator. While these eigenvalues usually cannot be calculated exactly, they can be found numerically and must also satisfy various bounds. In this talk, I will discuss a new approach for finding numerical bounds on the eigenvalues of closed hyperbolic manifolds using general consistency conditions and semidefinite programming, inspired by the approach of the conformal bootstrap from physics. Although these bootstrap bounds follow from seemingly trivial consistency conditions, they are surprisingly strong and are sometimes almost saturated by actual manifolds; for example, one such bound implies that the first nonzero eigenvalue of a closed hyperbolic surface must be less than 3.83890, and this is very close to being saturated by a particular genus-2 surface called the Bolza surface. I will show how to derive this and other bounds and will discuss some possible future directions for this approach. 

3/2/2022Jared Kaplan, Johns Hopkins Dept. of Physics & AstronomyTitle: Scaling Laws and Their Implications for Coding AI

Abstract:  Scaling laws and associated downstream trends can be used as an organizing principle when thinking about current and future ML progress.  I will briefly review scaling laws for generative models in a number of domains, emphasizing language modeling.  Then I will discuss scaling results for transfer from natural language to code, and results on python programming performance from “codex” and other models.  If there’s time I’ll discuss prospects for the future — limitations from dataset sizes, and prospects for RL and other techniques.

3/9/2022Iddo Drori, MIT EE&CS and Columbia School of EngineeringTitle: Machine Learning 30 STEM Courses in 12 Departments

Abstract: We automatically solve, explain, and generate university-level course problems from thirty STEM courses (at MIT, Harvard, and Columbia) for the first time.
We curate a new dataset of course questions and answers across a dozen departments: Aeronautics and Astronautics, Chemical Engineering, Chemistry, Computer Science, Economics, Electrical Engineering, Materials Science, Mathematics, Mechanical Engineering, Nuclear Science, Physics, and Statistics.
We generate new questions and use them in a Columbia University course, and perform A/B tests demonstrating that these machine generated questions are indistinguishable from human-written questions and that machine generated explanations are as useful as human-written explanations, again for the first time.
Our approach consists of five steps:
(i) Given course questions, turn them into programming tasks;
(ii) Automatically generate programs from the programming tasks using a Transformer model, OpenAI Codex, pre-trained on text and fine-tuned on code;
(iii) Execute the programs to obtain and evaluate the answers; 
(iv) Automatically explain the correct solutions using Codex;
(v) Automatically generate new questions that are qualitatively indistinguishable from human-written questions. 
This work is a significant step forward in applying machine learning for education, automating a considerable part of the work involved in teaching. 
Our approach allows personalization of questions based on difficulty level and student backgrounds, and scales up to a broad range of courses across the schools of engineering and science.

This is joint work with students and colleagues at MIT, Harvard University, Columbia University, Worcester Polytechnic Institute, and the University of Waterloo.

3/23/2022Stanislas Polu, OpenAITitle: Formal Mathematics Statement Curriculum Learning

Abstract: We explore the use of expert iteration in the context of language modeling applied to formal mathematics. We show that at same compute budget, expert iteration, by which we mean proof search interleaved with learning, dramatically outperforms proof search only.  We also observe that when applied to a collection of formal statements of sufficiently varied difficulty, expert iteration is capable of finding and solving a curriculum of increasingly difficult problems,  without the need for associated ground-truth proofs. Finally, by applying this expert iteration to a manually curated set of problem statements, we achieve state-of-the-art on the miniF2F benchmark,  automatically solving multiple challenging problems drawn from high school olympiads.

3/30/2022Yuhuai Wu, Stanford and GoogleTitle: Memorizing Transformers

Abstract: Language models typically need to be trained or fine-tuned in order to acquire new knowledge, which involves updating their weights. We instead envision language models that can simply read and memorize new data at inference time, thus acquiring new knowledge immediately. In this talk, I will discuss how we extend language models with the ability to memorize the internal representations of past inputs. We demonstrate that an approximate NN lookup into a non-differentiable memory of recent (key, value) pairs improves language modeling across various benchmarks and tasks, including generic webtext (C4), math papers (arXiv), books (PG-19), code (Github), as well as formal theorems (Isabelle). We show that the performance steadily improves when we increase the size of memory up to 262K tokens. We also find that the model is capable of making use of newly defined functions and theorems during test time.


Fall 2021

9/15/2021Francois Chollet, Google

*special time: 3:00PM*
Title: Why abstraction is the key to intelligence, and what we’re still missing

Abstract: This talk provides a personal perspective on the way forward towards more human-like and more intelligent artificial systems. Traditionally, symbolic and probabilistic methods have dominated the domains of concept formation, abstraction, and automated reasoning. More recently, deep learning-based approaches have led to significant breakthroughs, including successes in games and combinatorial search tasks. However, the resulting systems are still limited in scope and capabilities — they remain brittle, data-hungry, and their generalization capabilities are limited. We will address a set of questions: why is conceptual abstraction essential for intelligence? What is the nature of abstraction, and its relationship to generalization? What kind of abstraction can deep learning models generate, and where do they fail? What are the methods that are currently successful in generating strong conceptual abstraction? Finally, we will consider how to leverage a hybrid approach to reinforce the strength of different approaches while compensating for their respective weaknesses.

9/22/2021JM Landsberg, Texas A&MTitle: The complexity of matrix multiplication approached via algebraic geometry and representation theory.

Abstract: In 1968 V. Strassen discovered the way we usually multiply matrices is not the most efficient possible, and after considerable work by many authors, it is generally conjectured by computer scientists that as the size of matrices becomes large, it becomes almost as easy to multiply them as it is to add them. I will give a brief history of the problem, explain how this conjecture is naturally understood in the framework of classical algebraic geometry and representation theory, and conclude by describing recent advances using more sophisticated tools from algebraic geometry. For most of the talk, no knowledge of algebraic geometry or representation theory will be needed.

9/29/2021Adam Wagner, Tel Aviv UniversityTitle: Constructions in combinatorics via neural networks

Abstract: Recently, significant progress has been made in the area of machine learning algorithms, and they have quickly become some of the most exciting tools in a scientist’s toolbox. In particular, recent advances in the field of reinforcement learning have led computers to reach superhuman level play in Atari games and Go, purely through self-play. In this talk I will give a very basic introduction to neural networks and reinforcement learning algorithms. I will also indicate how these methods can be adapted to the ““game” of trying to find a counterexample to a mathematical conjecture, and show some examples where this approach was successful.

10/6/2021Thomas Fischbacher, GoogleTitle: New results in Supergravity via ML Technology

Abstract: The infrastructure built to power the Machine Learning revolution has many other uses beyond Deep Learning. Starting from a general architecture-level overview over the lower levels of Google’s TensorFlow machine learning library, we review how this has recently helped us to find all the stable vacua of SO(8) Supergravity in 3+1 dimensions, has allowed major progress on other related questions about M theory, and briefly discuss other applications in field theory and beyond.

10/13/2021Marijn Heule, Carnegie Mellon UniversityTitle: Computer-Aided Mathematics and Satisfiability

Abstract: Progress in satisfiability (SAT) solving has made it possible to determine the correctness of complex systems and answer long-standing open questions in mathematics. The SAT solving approach is completely automatic and can produce clever though potentially gigantic proofs. We can have confidence in the correctness of the answers because highly trustworthy systems can validate the underlying proofs regardless of their size.

We demonstrate the effectiveness of the SAT approach by presenting some recent successes, including the solution of the Boolean Pythagorean Triples problem, computing the fifth Schur number, and resolving the remaining case of Keller’s conjecture. Moreover, we constructed and validated a proof for each of these results. The second part of the talk focuses on notorious math challenges for which automated reasoning may well be suitable. In particular, we discuss our progress on applying SAT solving techniques to the chromatic number of the plane (Hadwiger-Nelson problem), optimal schemes for matrix multiplication, and the Collatz conjecture.

10/27/2021Patrick Massot, Laboratoire de Mathématiques d’Orsay and CNRSTitle: Why explain mathematics to computers?

Abstract: A growing number of mathematicians are having fun explaining mathematics to computers using proof assistant softwares. This process is called formalization. In this talk I’ll describe what formalization looks like, what kind of things it teaches us, and how it could even turn out to be useful (in our usual sense of “useful”). This will not be a talk about foundations of mathematics, and I won’t assume any prior knowledge about formalization.


11/03/2021Curtis Bright, School of Computer Science, University of Windsor
Vijay Ganesh, Dept. of Electrical and Computer Engineering, University of Waterloo
Title: When Computer Algebra Meets Satisfiability: A New Approach to Combinatorial Mathematics

Abstract: Solvers for the Boolean satisfiability (SAT) problem have been increasingly used to resolve problems in mathematics due to their excellent search algorithms.  This talk will describe a new method for mathematical search that couples SAT solvers with computer algebra systems (CAS), thereby combining the expressiveness of CASs with the search power of SAT solvers.  This paradigm has led to a number of results on long-standing mathematical questions such as the first computer-verifiable resolution of Lam’s problem and the discovery of a new infinite class of Williamson matrices.

12/01/2021Dan Roberts, MIT & SalesforceTitle: The Principles of Deep Learning Theory

Abstract: Deep learning is an exciting approach to modern artificial intelligence based on artificial neural networks. The goal of this talk is to provide a blueprint — using tools from physics — for theoretically analyzing deep neural networks of practical relevance. This task will encompass both understanding the statistics of initialized deep networks and determining the training dynamics of such an ensemble when learning from data.

In terms of their “microscopic” definition, deep neural networks are a flexible set of functions built out of many basic computational blocks called neurons, with many neurons in parallel organized into sequential layers. Borrowing from the effective theory framework, we will develop a perturbative 1/n expansion around the limit of an infinite number of neurons per layer and systematically integrate out the parameters of the network. We will explain how the network simplifies at large width and how the propagation of signals from layer to layer can be understood in terms of a Wilsonian renormalization group flow. This will make manifest that deep networks have a tuning problem, analogous to criticality, that needs to be solved in order to make them useful. Ultimately we will find a “macroscopic” description for wide and deep networks in terms of weakly-interacting statistical models, with the strength of the interactions between the neurons growing with depth-to-width aspect ratio of the network. Time permitting, we will explain how the interactions induce representation learning.

This talk is based on a book, The Principles of Deep Learning Theory, co-authored with Sho Yaida and based on research also in collaboration with Boris Hanin. It will be published next year by Cambridge University Press.

12/8/2021Piotr Nawrot, University of WarsawTitle: Hierarchical Transformers are More Efficient Language Models

Abstract: Transformer models yield impressive results on many NLP and sequence modeling tasks. Remarkably, Transformers can handle long sequences which allows them to produce long coherent outputs: full paragraphs produced by GPT-3 or well-structured images produced by DALL-E. These large language models are impressive but also very inefficient and costly, which limits their applications and accessibility. We postulate that having an explicit hierarchical architecture is the key to Transformers that efficiently handle long sequences. To verify this claim, we first study different ways to upsample and downsample activations in Transformers so as to make them hierarchical. We use the best performing upsampling and downsampling layers to create Hourglass – a hierarchical Transformer language model. Hourglass improves upon the Transformer baseline given the same amount of computation and can yield the same results as Transformers more efficiently. In particular, Hourglass sets new state-of-the-art for Transformer models on the ImageNet32 generation task and improves language modeling efficiency on the widely studied enwik8 benchmark.

12/15/2021Anurag Anshu, Department of EECS & Challenge Institute for Quantum Computation, UC BerkeleyTitle: Unreasonable effectiveness of the quantum complexity view on quantum many-body physics

Abstract: A central challenge in quantum many-body physics is to estimate the properties of natural quantum states, such as the quantum ground states and Gibbs states. Quantum Hamiltonian complexity offers a computational perspective on this challenge and classifies these natural quantum states using the language of quantum complexity classes. This talk will provide a gentle introduction to the field and highlight its success in pinning down the hardness of a wide variety of quantum states. In particular, we will consider the gapped ground states and Gibbs states on low dimensional lattices, which are believed to exhibit ‘low complexity’ due to the widely studied area law behaviour. Here, we will see the crucial role of complexity-theoretic methods in progress on the ‘area law conjecture’ and in the development of efficient algorithms to classically simulate quantum many-body systems.       


Past Seminars

During the Fall 2020 and Spring 2021, the CMSA will be hosting a seminar on Computers and Math, organized by Michael Douglas. This seminar will take place on Wednesday from 3:00 – 4:00pm. There will also be a group meeting on Monday from 9:30 – 10:30am. Both of these meetings will take place virtually. To learn how to attend, please fill out this form, or contact Michael Douglas (,

Spring 2021

1/13/2021Josef Urban, Czech Institute of Informatics, Robotics and Cybernetics


Title: AI and Theorem Proving

Abstract: The talk will discuss the main approaches that combine machine learning with automated theorem proving and automated formalization. This includes learning to choose relevant facts for “hammer” systems, guiding the proof search of tableaux and superposition automated provers by interleaving learning and proving (reinforcement learning) over large ITP libraries, guiding the application of tactics in interactive tactical systems, and various forms of lemmatization and conjecturing. I will also show some demos of the systems, and discuss autoformalization approaches such as learning probabilistic grammars from aligned informal/formal corpora, combining them with semantic pruning, and using neural methods to learn direct translation from Latex to formal mathematics.
1/20/2021Christian Szegedy, Google Research


Title: Language modeling for Mathematical Reasoning

Abstract: In this talk, I will summarize the current state of the art of transformer based language models and give examples on non-trivial reasoning task language models can solve in higher order logic reasoning. I will also discuss how to inject injective bias into transformer networks via pretraining on very simple synthetic tasks and on representing graph structures for transformer networks.
1/27/2021Carl Allen and Ivana Balažević, Univ. of Edinburgh School of Informatics


Title: Knowledge Graph Representation: From Recent Models towards a Theoretical Understanding.

Abstract: Knowledge graphs (KGs), or knowledge bases, are large repositories of facts in the form of triples (subject, relation, object), e.g. (Edinburgh, capital_of, Scotland). Many models have been developed to succinctly represent KGs such that known facts can be recalled (question answering) and, more impressively, previously unknown facts can be inferred (link prediction). Subject and object entities are typically represented as vectors in R^d and relations as mappings (e.g. linear transformations) between them. Such representation can be interpreted as positioning entities in a space such that relations are implied by their relative locations. In this talk we give an overview of knowledge graph representation including select recent models; and, by drawing a connection to word embeddings, explain a theoretical model for how semantic relationships can correspond to geometric structure.
2/10/2021Nikunj Saunshi, Dept. of Computer Science, Princeton University


Title: A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks

Abstract: Autoregressive language models pretrained on large corpora have been successful at solving downstream tasks, even with zero-shot usage. However, there is little theoretical justification for their success. This paper considers the following questions: (1) Why should learning the distribution of natural language help with downstream classification tasks? (2) Why do features learned using language modeling help solve downstream tasks with linear classifiers? For (1), we hypothesize, and verify empirically, that classification tasks of interest can be reformulated as next word prediction tasks, thus making language modeling a meaningful pretraining task. For (2), we analyze properties of the cross-entropy objective to show that eps-optimal language models in cross-entropy (log-perplexity) learn features that are O(sqrt{eps}) good on such linear classification tasks, thus demonstrating mathematically that doing well on language modeling can be beneficial for downstream tasks. We perform experiments to verify assumptions and validate our theoretical results. Our theoretical insights motivate a simple alternative to the cross-entropy objective that performs well on some linear classification tasks.
2/24/2021Thomas Hales, Univ. of Pittsburgh Dept. of MathematicsTitle: A Mathematical Language

Abstract: A controlled natural language for mathematics is an an artificial language that is designed in an explicit way with precise computer-readable syntax and semantics.  It is based on a single natural language (which for us is English) and can be broadly understood by mathematically literate English speakers.  This talk will describe the design of a controlled natural language for mathematics that has been influenced by the Lean theorem prover, by TeX, and by earlier controlled natural languages. The semantics are provided by dependent type theory.
3/3/2021Jason Rute, CIBO Technologies


Title: Neural Theorem Proving in Lean using Proof Artifact Co-training and Language Models

Abstract: Labeled data for imitation learning of theorem proving in large libraries of formalized mathematics is scarce as such libraries require years of concentrated effort by human specialists to be built. This is particularly challenging when applying large Transformer language models to tactic prediction, because the scaling of performance with respect to model size is quickly disrupted in the data-scarce, easily-overfitted regime. We propose PACT ({\bf P}roof {\bf A}rtifact {\bf C}o-{\bf T}raining), a general methodology for extracting abundant self-supervised data from kernel-level proof terms for co-training alongside the usual tactic prediction objective. We apply this methodology to Lean, an interactive proof assistant which hosts some of the most sophisticated formalized mathematics to date. We instrument Lean with a neural theorem prover driven by a Transformer language model and show that PACT improves theorem proving success rate on a held-out suite of test theorems from 32\% to 48\%.
3/10/2021Ido Kaminer, Technion – Israel Institute of Technology, Faculty of Electrical Engineering

Title: The Ramanujan Machine: Using Algorithms for the Discovery of Conjectures on Mathematical Constants

Abstract: In the past, new conjectures about fundamental constants were discovered sporadically by famous mathematicians such as Newton, Euler, Gauss, and Ramanujan. The talk will present a different approach – a systematic algorithmic approach that discovers new mathematical conjectures on fundamental constants. We call this approach “the Ramanujan Machine”. The algorithms found dozens of well-known formulas as well as previously unknown ones, such as continued fraction representations of π, e, Catalan’s constant, and values of the Riemann zeta function. Part of the conjectures were in retrospect simple to prove, whereas others remained so far unproved. We will discuss these puzzles and wider open questions that arose from this algorithmic investigation – specifically, a newly-discovered algebraic structure that seems to generalize all the known formulas and connect between fundamental constants. We will also discuss two algorithms that proved useful in finding conjectures: a variant of the meet-in-the-middle algorithm and a gradient descent algorithm tailored to the recurrent structure of continued fractions. Both algorithms are based on matching numerical values; consequently, they conjecture formulas without providing proofs or requiring prior knowledge of the underlying mathematical structure. This way, our approach reverses the conventional usage of sequential logic in formal proofs; instead, using numerical data to unveil mathematical structures and provide leads to further mathematical research.
3/24/2021Steve Skiena, Dept. of Computer Science and AI Insititute, Stony Brook University

Title: Word and Graph Embeddings for Machine Learning

Abstract: DeepWalk is an approach we have developed to construct vertex embeddings: vector representations of vertices which be applied to a very general class of problems in data mining and information retrieval.  DeepWalk exploits an appealing analogy between sentences as sequences of words and random walks as sequences of vertices to transfer deep learning (unsupervised feature learning) techniques from natural language processing to network analysis.  It has become extremely popular, having been cited by over 4600 research papers since its publication at KDD 2014.   In this talk, I will introduce the notion of graph embeddings, and demonstrate why they make such powerful features for machine learning applications.  I will focus on more recent efforts concerning (1) fast embedding methods for very large networks, (2) techniques for embedding dynamic graphs, and (3) embedding spaces as models for knowledge generation.
3/31/2021Lawrence Paulson, University of Cambridge Computer Laboratory


Title: Doing Mathematics with Simple Types: Infinitary Combinatorics in Isabelle/HOL

Abstract: Are proof assistants relevant to mathematics? One approach to this question is to explore the breadth of mathematical topics that can be formalised. The partition calculus was introduced by Erdös and R. Rado in 1956 as the study of “analogues and extensions of Ramsey’s theorem”. Highly technical results were obtained by Erdös-Milner, Specker and Larson (among many others) for the particular case of ordinal partition relations, which is concerned with countable ordinals and order types. Much of this material was formalised last year (with the assistance of Džamonja and Koutsoukou-Argyraki). Some highlights of this work will be presented along with general observations about the formalisation of mathematics, including ZFC, in simple type theory.
4/7/2021David McAllester, Toyota Technological Institute of Chicago

Title: Type Theory from the Perspective of Artificial Intelligence

Abstract: This talk will discuss dependent type theory from the perspective of artificial intelligence and cognitive science.  From an artificial intelligence perspective it will be argued that type theory is central to defining the “game” of mathematics — an action space and reward structure for pure mathematics. From a cognitive science perspective type theory provides a model of the grammar of the colloquial (natural) language of mathematics.  Of particular interest is the notion of a signature-axiom structure class and the three fundamental notions of equality in mathematics — set-theoretic equality between structure elements, isomorphism between structures, and Birkoff and Rota’s notion of cryptomorphism between structure classes.  This talk will present a version of type theory based on set-theoretic semantics and the 1930’s notion of structure and isomorphism given by the Bourbaki group of mathematicians.  It will be argued that this “Bourbaki type theory” (BTT) is more natural and accessible to classically trained mathematicians than Martin-Löf type theory (MLTT). BTT avoids the Curry-Howard isomorphism and axiom J of MLTT.  The talk will also discuss BTT as a model of MLTT.  The BTT model is similar to the groupoid model in that propositional equality is interpreted as isomorphism but different in various details.  The talk will also briefly mention initial thoughts in defining an action space and reward structure for a game of mathematics.
4/14/2021Miles Cranmer, Dept. of Astrophysical Sciences, Princeton University

TitleA Bayesian neural network predicts the dissolution of compact planetary systems

Abstract: Despite over three hundred years of effort, no solutions exist for predicting when a general planetary configuration will become unstable. I will discuss our deep learning architecture (arxiv:2101.04117) which pushes forward this problem for compact systems. While current machine learning algorithms in this area rely on scientist-derived instability metrics, our new technique learns its own metrics from scratch, enabled by a novel internal structure inspired from dynamics theory. The Bayesian neural network model can accurately predict not only if, but also when a compact planetary system with three or more planets will go unstable. Our model, trained directly from short N-body time series of raw orbital elements, is more than two orders of magnitude more accurate at predicting instability times than analytical estimators, while also reducing the bias of existing machine learning algorithms by nearly a factor of three. Despite being trained on three-planet configurations, the model demonstrates robust generalization to five-planet systems, even outperforming models designed for that specific set of integrations. I will also discuss some work on recovering symbolic representations of such models using arxiv:2006.11287.
4/21/2021Michael Shulman, Dept. of Mathematics, University of San Diego

Title: Homotopy type theory and the quest for extensionality

Abstract: Over the past decades, dependent type theory has proven to be a powerful framework for verified software and formalized mathematics. However, its treatment of equality has always been somewhat uncomfortable. Recently, homotopy type theory has made progress towards a more useful notion of equality, which natively implements both isomorphism-invariance in mathematics and representation-independence in programming. This progress is based on ideas from abstract homotopy theory and higher category theory, and with the development of cubical type theories it can be implemented as a true programming language. In this talk I will survey these developments and their potential applications, and suggest some directions for further improvement.

Fall 2020:

9/16/2020William Hamilton, McGill University and MILA

Title: Graph Representation Learning: Recent Advances and Open Challenges

Abstract: Graph-structured data is ubiquitous throughout the natural and social sciences, from telecommunication networks to quantum chemistry. Building relational inductive biases into deep learning architectures is crucial if we want systems that can learn, reason, and generalize from this kind of data. Recent years have seen a surge in research on graph representation learning, most prominently in the development of graph neural networks (GNNs). Advances in GNNs have led to state-of-the-art results in numerous domains, including chemical synthesis, 3D-vision, recommender systems, question answering, and social network analysis. In the first part of this talk I will provide an overview and summary of recent progress in this fast-growing area, highlighting foundational methods and theoretical motivations. In the second part of this talk I will discuss fundamental limitations of the current GNN paradigm and propose open challenges for the theoretical advancement of the field.
9/23/2020Andrea Montanari, Departments of Electrical Engineering and Statistics, Stanford 

Title: Self-induced regularization from linear regression to neural networks

Abstract: Modern machine learning methods –most noticeably multi-layer neural networks– require to fit highly non-linear models comprising tens of thousands to millions of parameters. Despite this, little attention is paid to the regularization mechanism to control model’s complexity. Indeed, the resulting models are often so complex as to achieve vanishing training error: they interpolate the data.  Despite this, these models generalize well to unseen data : they have small test error. I will discuss several examples of this phenomenon, beginning with a simple linear regression model, and ending with two-layers neural networks in the so-called lazy regime. For these examples precise asymptotics could be determined mathematically, using tools from random matrix theory. I will try to extract a unifying picture.
A common feature is the fact that a complex unregularized nonlinear model becomes essentially
equivalent to a simpler model, which is however regularized in a non-trivial way.
[Based on joint papers with: Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Feng Ruan, Youngtak Sohn, Jun Yan, Yiqiao Zhong]
10/7/2020Marinka Zitnik, Department of Biomedical Informatics, Harvard

Title: Subgraph Representation Learning

Abstract: Graph representation learning has emerged as a dominant paradigm for networked data. Still, prevailing methods require abundant label information and focus on representations of nodes, edges, or entire graphs. While graph-level representations provide overarching views of graphs, they do so at the loss of finer local structure. In contrast, node-level representations preserve local topological structures, potentially to the detriment of the big picture. In this talk, I will discuss how subgraph representations are critical to advance today’s methods. First, I will outline Sub-GNN, the first subgraph neural network to learn disentangled subgraph representations. Second, I will describe G-Meta, a novel meta-learning approach for graphs. G-Meta uses subgraphs to adapt to a new task using only a handful of nodes or edges. G-Meta is theoretically justified, and remarkably, can learn in most challenging, few-shot settings that require generalization to completely new graphs and never-before-seen labels. Finally, I will discuss applications in biology and medicine. The new methods have enabled the repurposing of drugs for new diseases, including COVID-19, where predictions were experimentally verified in the wet laboratory. Further, the methods identified drug combinations safer for patients than previous treatments and provided accurate predictions that can be interpreted meaningfully.
10/14/2020Jeffrey Pennington, Google Brain


Title: Triple Descent and a Fine-Grained Bias-Variance Decomposition

Abstract: Classical learning theory suggests that the optimal generalization performance of a machine learning model should occur at an intermediate model complexity, striking a balance between simpler models that exhibit high bias and more complex models that exhibit high variance of the predictive function. However, such a simple trade-off does not adequately describe the behavior of many modern deep learning models, which simultaneously attain low bias and low variance in the heavily overparameterized regime. Recent efforts to explain this phenomenon theoretically have focused on simple settings, such as linear regression or kernel regression with unstructured random features, which are too coarse to reveal important nuances of actual neural networks. In this talk, I will describe a precise high-dimensional asymptotic analysis of Neural Tangent Kernel regression that reveals some of these nuances, including non-monotonic behavior deep in the overparameterized regime. I will also present a novel bias-variance decomposition that unambiguously attributes these surprising observations to particular sources of randomness in the training procedure.
10/21/2020Marinka Zitnik, Department of Biomedical Informatics, HarvardTitleSubgraph Representation Learning, part 2

Continuation from Oct 7.
10/28/2020Boaz Barak and Yamini Bansal, Harvard University Dept. of Computer Science 

Title: Generalization bounds for rational self-supervised learning algorithms, or “Understanding generalizations requires rethinking deep learning”

Abstract: The generalization gap of a learning algorithm is the expected difference between its performance on the training data and its performance on fresh unseen test samples. Modern deep learning algorithms typically have large generalization gaps, as they use more parameters than the size of their training set. Moreover the best known rigorous bounds on their generalization gap are often vacuous.
In this talk we will see a new upper bound on the generalization gap of classifiers that are obtained by first using self-supervision to learn a complex representation of the (label free) training data, and then fitting a simple (e.g., linear) classifier to the labels. Such classifiers have become increasingly popular in recent years, as they offer several practical advantages and have been shown to approach state-of-art results.
We show that (under the assumptions described below) the generalization gap of such classifiers tends to zero as long as the complexity of the simple classifier is asymptotically smaller than the number of training samples. We stress that our bound is independent of the complexity of the representation that can use an arbitrarily large number of parameters.
Our bound assuming that the learning algorithm satisfies certain noise-robustness (adding small amount of label noise causes small degradation in performance) and rationality (getting the wrong label is not better than getting no label at all) conditions that widely (and sometimes provably) hold across many standard architectures.
We complement this result with an empirical study, demonstrating that our bound is non-vacuous for many popular representation-learning based classifiers on CIFAR-10 and ImageNet, including SimCLR, AMDIM and BigBiGAN.
The talk will not assume any specific background in machine learning, and should be accessible to a general mathematical audience. Joint work with Gal Kaplun.
11/4/2020Florent Krzakala, EPFL

Title: Some exactly solvable models for machine learning via Statistical physics

Abstract: The increasing dimensionality of data in the modern machine learning age presents new challenges and opportunities.  The high dimensional settings allow one to use powerful asymptotic methods from probability theory and statistical physics to obtain precise characterizations and develop new algorithmic approaches. Statistical mechanics approaches, in particular, are very well suited for such problems. Will give examples of recent works in our group that build on powerful methods of statistical physics of disordered systems to analyze some relevant questions in machine learning and neural networks, including overparameterization, kernel methods, and the behavior gradient descent algorithm in a high dimensional non-convex landscape.
11/11/2020Eric Mjolsness, Departments of Computer Science and Mathematics, UC Irvine

Title: Towards AI for mathematical modeling of complex biological systems: Machine-learned model reduction, spatial graph dynamics, and symbolic mathematics

Abstract: The complexity of biological systems (among others) makes demands on the complexity of the mathematical modeling enterprise that could be satisfied with mathematical artificially intelligence of both symbolic and numerical flavors. Technologies that I think will be fruitful in this regard include (1) the use of machine learning to bridge spatiotemporal scales, which I will illustrate with the “Dynamic Boltzmann Distribution” method for learning model reduction of stochastic spatial biochemical networks and the “Graph Prolongation Convolutional Network” approach to course-graining the biophysics of microtubules; (2) a meta-language for stochastic spatial graph dynamics, “Dynamical Graph Grammars”, that can represent structure-changing processes including microtubule dynamics and that has an underlying combinatorial theory related to operator algebras; and (3) an integrative conceptual architecture of typed symbolic modeling languages and structure-preserving maps between them, including model reduction and implementation maps.
11/18/2020Yang-Hui He, Oxford University, City University of London and Nankai University

Title: Universes as Big data, or Machine-Learning Mathematical Structures

Abstract: We review how historically the problem of string phenomenology lead theoretical physics first to algebraic/differetial geometry, and then to computational geometry, and now to data science and AI.

With the concrete playground of the Calabi-Yau landscape, accumulated by the collaboration of physicists, mathematicians and computer scientists over the last 4 decades, we show how the latest techniques in machine-learning can help explore problems of physical and mathematical interest, from geometry, to group theory, to combinatorics and number theory.
12/9/2020James Gray, Virginia Tech, Dept. of PhysicsTopic:  Machine learning and SU(3) structures on six manifolds

Abstract: In this talk we will discuss the application of Machine Learning techniques to obtain numerical approximations to various metrics of SU(3) structure on six manifolds. More precisely, we will be interested in SU(3) structures whose torsion classes make them suitable backgrounds for various string compactifications. A variety of aspects of this topic will be covered. These will include learning moduli dependent Ricci-Flat metrics on Calabi-Yau threefolds and obtaining numerical approximations to torsional SU(3) structures.

Related Posts