New Technologies in Mathematics Seminar Series

During the Fall 2020, the CMSA will be hosting a seminar on Computers and Math, organized by Michael Douglas. This seminar will take place on Wednesday from 3:00 – 4:00pm. There will also be a group meeting on Monday from 9:30 – 10:30am. Both of these meetings will take place virtually. To learn how to attend, please fill out this form, or contact Michael Douglas (,

The schedule below will be updated as talks are confirmed.

Spring 2021:

1/13/2021Josef Urban, Czech Institute of Informatics, Robotics and Cybernetics


Title: AI and Theorem Proving

Abstract: The talk will discuss the main approaches that combine machine learning with automated theorem proving and automated formalization. This includes learning to choose relevant facts for “hammer” systems, guiding the proof search of tableaux and superposition automated provers by interleaving learning and proving (reinforcement learning) over large ITP libraries, guiding the application of tactics in interactive tactical systems, and various forms of lemmatization and conjecturing. I will also show some demos of the systems, and discuss autoformalization approaches such as learning probabilistic grammars from aligned informal/formal corpora, combining them with semantic pruning, and using neural methods to learn direct translation from Latex to formal mathematics.
1/20/2021Christian Szegedy, Google Research


Title: Language modeling for Mathematical Reasoning

Abstract: In this talk, I will summarize the current state of the art of transformer based language models and give examples on non-trivial reasoning task language models can solve in higher order logic reasoning. I will also discuss how to inject injective bias into transformer networks via pretraining on very simple synthetic tasks and on representing graph structures for transformer networks.
1/27/2021Carl Allen and Ivana Balažević, Univ. of Edinburgh School of Informatics


Title: Knowledge Graph Representation: From Recent Models towards a Theoretical Understanding.

Abstract: Knowledge graphs (KGs), or knowledge bases, are large repositories of facts in the form of triples (subject, relation, object), e.g. (Edinburgh, capital_of, Scotland). Many models have been developed to succinctly represent KGs such that known facts can be recalled (question answering) and, more impressively, previously unknown facts can be inferred (link prediction). Subject and object entities are typically represented as vectors in R^d and relations as mappings (e.g. linear transformations) between them. Such representation can be interpreted as positioning entities in a space such that relations are implied by their relative locations. In this talk we give an overview of knowledge graph representation including select recent models; and, by drawing a connection to word embeddings, explain a theoretical model for how semantic relationships can correspond to geometric structure.
2/10/2021Nikunj Saunshi, Dept. of Computer Science, Princeton University


Title: A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks

Abstract: Autoregressive language models pretrained on large corpora have been successful at solving downstream tasks, even with zero-shot usage. However, there is little theoretical justification for their success. This paper considers the following questions: (1) Why should learning the distribution of natural language help with downstream classification tasks? (2) Why do features learned using language modeling help solve downstream tasks with linear classifiers? For (1), we hypothesize, and verify empirically, that classification tasks of interest can be reformulated as next word prediction tasks, thus making language modeling a meaningful pretraining task. For (2), we analyze properties of the cross-entropy objective to show that eps-optimal language models in cross-entropy (log-perplexity) learn features that are O(sqrt{eps}) good on such linear classification tasks, thus demonstrating mathematically that doing well on language modeling can be beneficial for downstream tasks. We perform experiments to verify assumptions and validate our theoretical results. Our theoretical insights motivate a simple alternative to the cross-entropy objective that performs well on some linear classification tasks.
2/24/2021Thomas Hales, Univ. of Pittsburgh Dept. of MathematicsTitle: A Mathematical Language

Abstract: A controlled natural language for mathematics is an an artificial language that is designed in an explicit way with precise computer-readable syntax and semantics.  It is based on a single natural language (which for us is English) and can be broadly understood by mathematically literate English speakers.  This talk will describe the design of a controlled natural language for mathematics that has been influenced by the Lean theorem prover, by TeX, and by earlier controlled natural languages. The semantics are provided by dependent type theory.
3/3/2021Jason Rute, CIBO Technologies


Title: Neural Theorem Proving in Lean using Proof Artifact Co-training and Language Models

Abstract: Labeled data for imitation learning of theorem proving in large libraries of formalized mathematics is scarce as such libraries require years of concentrated effort by human specialists to be built. This is particularly challenging when applying large Transformer language models to tactic prediction, because the scaling of performance with respect to model size is quickly disrupted in the data-scarce, easily-overfitted regime. We propose PACT ({\bf P}roof {\bf A}rtifact {\bf C}o-{\bf T}raining), a general methodology for extracting abundant self-supervised data from kernel-level proof terms for co-training alongside the usual tactic prediction objective. We apply this methodology to Lean, an interactive proof assistant which hosts some of the most sophisticated formalized mathematics to date. We instrument Lean with a neural theorem prover driven by a Transformer language model and show that PACT improves theorem proving success rate on a held-out suite of test theorems from 32\% to 48\%.
3/10/2021Ido Kaminer, Technion – Israel Institute of Technology, Faculty of Electrical Engineering

Title: The Ramanujan Machine: Using Algorithms for the Discovery of Conjectures on Mathematical Constants

Abstract: In the past, new conjectures about fundamental constants were discovered sporadically by famous mathematicians such as Newton, Euler, Gauss, and Ramanujan. The talk will present a different approach – a systematic algorithmic approach that discovers new mathematical conjectures on fundamental constants. We call this approach “the Ramanujan Machine”. The algorithms found dozens of well-known formulas as well as previously unknown ones, such as continued fraction representations of π, e, Catalan’s constant, and values of the Riemann zeta function. Part of the conjectures were in retrospect simple to prove, whereas others remained so far unproved. We will discuss these puzzles and wider open questions that arose from this algorithmic investigation – specifically, a newly-discovered algebraic structure that seems to generalize all the known formulas and connect between fundamental constants. We will also discuss two algorithms that proved useful in finding conjectures: a variant of the meet-in-the-middle algorithm and a gradient descent algorithm tailored to the recurrent structure of continued fractions. Both algorithms are based on matching numerical values; consequently, they conjecture formulas without providing proofs or requiring prior knowledge of the underlying mathematical structure. This way, our approach reverses the conventional usage of sequential logic in formal proofs; instead, using numerical data to unveil mathematical structures and provide leads to further mathematical research.
3/24/2021Steve Skiena, Dept. of Computer Science and AI Insititute, Stony Brook University

Title: Word and Graph Embeddings for Machine Learning

Abstract: DeepWalk is an approach we have developed to construct vertex embeddings: vector representations of vertices which be applied to a very general class of problems in data mining and information retrieval.  DeepWalk exploits an appealing analogy between sentences as sequences of words and random walks as sequences of vertices to transfer deep learning (unsupervised feature learning) techniques from natural language processing to network analysis.  It has become extremely popular, having been cited by over 4600 research papers since its publication at KDD 2014.   In this talk, I will introduce the notion of graph embeddings, and demonstrate why they make such powerful features for machine learning applications.  I will focus on more recent efforts concerning (1) fast embedding methods for very large networks, (2) techniques for embedding dynamic graphs, and (3) embedding spaces as models for knowledge generation.
3/31/2021Lawrence Paulson, University of Cambridge Computer Laboratory


Title: Doing Mathematics with Simple Types: Infinitary Combinatorics in Isabelle/HOL

Abstract: Are proof assistants relevant to mathematics? One approach to this question is to explore the breadth of mathematical topics that can be formalised. The partition calculus was introduced by Erdös and R. Rado in 1956 as the study of “analogues and extensions of Ramsey’s theorem”. Highly technical results were obtained by Erdös-Milner, Specker and Larson (among many others) for the particular case of ordinal partition relations, which is concerned with countable ordinals and order types. Much of this material was formalised last year (with the assistance of Džamonja and Koutsoukou-Argyraki). Some highlights of this work will be presented along with general observations about the formalisation of mathematics, including ZFC, in simple type theory.
4/7/2021David McAllester, Toyota Technological Institute of Chicago

Title: Type Theory from the Perspective of Artificial Intelligence

Abstract: This talk will discuss dependent type theory from the perspective of artificial intelligence and cognitive science.  From an artificial intelligence perspective it will be argued that type theory is central to defining the “game” of mathematics — an action space and reward structure for pure mathematics. From a cognitive science perspective type theory provides a model of the grammar of the colloquial (natural) language of mathematics.  Of particular interest is the notion of a signature-axiom structure class and the three fundamental notions of equality in mathematics — set-theoretic equality between structure elements, isomorphism between structures, and Birkoff and Rota’s notion of cryptomorphism between structure classes.  This talk will present a version of type theory based on set-theoretic semantics and the 1930’s notion of structure and isomorphism given by the Bourbaki group of mathematicians.  It will be argued that this “Bourbaki type theory” (BTT) is more natural and accessible to classically trained mathematicians than Martin-Löf type theory (MLTT). BTT avoids the Curry-Howard isomorphism and axiom J of MLTT.  The talk will also discuss BTT as a model of MLTT.  The BTT model is similar to the groupoid model in that propositional equality is interpreted as isomorphism but different in various details.  The talk will also briefly mention initial thoughts in defining an action space and reward structure for a game of mathematics.
4/14/2021Miles Cranmer, Dept. of Astrophysical Sciences, Princeton University

TitleA Bayesian neural network predicts the dissolution of compact planetary systems

Abstract: Despite over three hundred years of effort, no solutions exist for predicting when a general planetary configuration will become unstable. I will discuss our deep learning architecture (arxiv:2101.04117) which pushes forward this problem for compact systems. While current machine learning algorithms in this area rely on scientist-derived instability metrics, our new technique learns its own metrics from scratch, enabled by a novel internal structure inspired from dynamics theory. The Bayesian neural network model can accurately predict not only if, but also when a compact planetary system with three or more planets will go unstable. Our model, trained directly from short N-body time series of raw orbital elements, is more than two orders of magnitude more accurate at predicting instability times than analytical estimators, while also reducing the bias of existing machine learning algorithms by nearly a factor of three. Despite being trained on three-planet configurations, the model demonstrates robust generalization to five-planet systems, even outperforming models designed for that specific set of integrations. I will also discuss some work on recovering symbolic representations of such models using arxiv:2006.11287.
4/21/2021Michael Shulman, Dept. of Mathematics, University of San Diego

Title: Homotopy type theory and the quest for extensionality

Abstract: Over the past decades, dependent type theory has proven to be a powerful framework for verified software and formalized mathematics. However, its treatment of equality has always been somewhat uncomfortable. Recently, homotopy type theory has made progress towards a more useful notion of equality, which natively implements both isomorphism-invariance in mathematics and representation-independence in programming. This progress is based on ideas from abstract homotopy theory and higher category theory, and with the development of cubical type theories it can be implemented as a true programming language. In this talk I will survey these developments and their potential applications, and suggest some directions for further improvement.

Fall 2020:

9/16/2020William Hamilton, McGill University and MILA

Title: Graph Representation Learning: Recent Advances and Open Challenges

Abstract: Graph-structured data is ubiquitous throughout the natural and social sciences, from telecommunication networks to quantum chemistry. Building relational inductive biases into deep learning architectures is crucial if we want systems that can learn, reason, and generalize from this kind of data. Recent years have seen a surge in research on graph representation learning, most prominently in the development of graph neural networks (GNNs). Advances in GNNs have led to state-of-the-art results in numerous domains, including chemical synthesis, 3D-vision, recommender systems, question answering, and social network analysis. In the first part of this talk I will provide an overview and summary of recent progress in this fast-growing area, highlighting foundational methods and theoretical motivations. In the second part of this talk I will discuss fundamental limitations of the current GNN paradigm and propose open challenges for the theoretical advancement of the field.
9/23/2020Andrea Montanari, Departments of Electrical Engineering and Statistics, Stanford 

Title: Self-induced regularization from linear regression to neural networks

Abstract: Modern machine learning methods –most noticeably multi-layer neural networks– require to fit highly non-linear models comprising tens of thousands to millions of parameters. Despite this, little attention is paid to the regularization mechanism to control model’s complexity. Indeed, the resulting models are often so complex as to achieve vanishing training error: they interpolate the data.  Despite this, these models generalize well to unseen data : they have small test error. I will discuss several examples of this phenomenon, beginning with a simple linear regression model, and ending with two-layers neural networks in the so-called lazy regime. For these examples precise asymptotics could be determined mathematically, using tools from random matrix theory. I will try to extract a unifying picture.
A common feature is the fact that a complex unregularized nonlinear model becomes essentially
equivalent to a simpler model, which is however regularized in a non-trivial way.
[Based on joint papers with: Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Feng Ruan, Youngtak Sohn, Jun Yan, Yiqiao Zhong]
10/7/2020Marinka Zitnik, Department of Biomedical Informatics, Harvard

Title: Subgraph Representation Learning

Abstract: Graph representation learning has emerged as a dominant paradigm for networked data. Still, prevailing methods require abundant label information and focus on representations of nodes, edges, or entire graphs. While graph-level representations provide overarching views of graphs, they do so at the loss of finer local structure. In contrast, node-level representations preserve local topological structures, potentially to the detriment of the big picture. In this talk, I will discuss how subgraph representations are critical to advance today’s methods. First, I will outline Sub-GNN, the first subgraph neural network to learn disentangled subgraph representations. Second, I will describe G-Meta, a novel meta-learning approach for graphs. G-Meta uses subgraphs to adapt to a new task using only a handful of nodes or edges. G-Meta is theoretically justified, and remarkably, can learn in most challenging, few-shot settings that require generalization to completely new graphs and never-before-seen labels. Finally, I will discuss applications in biology and medicine. The new methods have enabled the repurposing of drugs for new diseases, including COVID-19, where predictions were experimentally verified in the wet laboratory. Further, the methods identified drug combinations safer for patients than previous treatments and provided accurate predictions that can be interpreted meaningfully.
10/14/2020Jeffrey Pennington, Google Brain


Title: Triple Descent and a Fine-Grained Bias-Variance Decomposition

Abstract: Classical learning theory suggests that the optimal generalization performance of a machine learning model should occur at an intermediate model complexity, striking a balance between simpler models that exhibit high bias and more complex models that exhibit high variance of the predictive function. However, such a simple trade-off does not adequately describe the behavior of many modern deep learning models, which simultaneously attain low bias and low variance in the heavily overparameterized regime. Recent efforts to explain this phenomenon theoretically have focused on simple settings, such as linear regression or kernel regression with unstructured random features, which are too coarse to reveal important nuances of actual neural networks. In this talk, I will describe a precise high-dimensional asymptotic analysis of Neural Tangent Kernel regression that reveals some of these nuances, including non-monotonic behavior deep in the overparameterized regime. I will also present a novel bias-variance decomposition that unambiguously attributes these surprising observations to particular sources of randomness in the training procedure.
10/21/2020Marinka Zitnik, Department of Biomedical Informatics, HarvardTitleSubgraph Representation Learning, part 2

Continuation from Oct 7.
10/28/2020Boaz Barak and Yamini Bansal, Harvard University Dept. of Computer Science 

Title: Generalization bounds for rational self-supervised learning algorithms, or “Understanding generalizations requires rethinking deep learning”

Abstract: The generalization gap of a learning algorithm is the expected difference between its performance on the training data and its performance on fresh unseen test samples. Modern deep learning algorithms typically have large generalization gaps, as they use more parameters than the size of their training set. Moreover the best known rigorous bounds on their generalization gap are often vacuous.
In this talk we will see a new upper bound on the generalization gap of classifiers that are obtained by first using self-supervision to learn a complex representation of the (label free) training data, and then fitting a simple (e.g., linear) classifier to the labels. Such classifiers have become increasingly popular in recent years, as they offer several practical advantages and have been shown to approach state-of-art results.
We show that (under the assumptions described below) the generalization gap of such classifiers tends to zero as long as the complexity of the simple classifier is asymptotically smaller than the number of training samples. We stress that our bound is independent of the complexity of the representation that can use an arbitrarily large number of parameters.
Our bound assuming that the learning algorithm satisfies certain noise-robustness (adding small amount of label noise causes small degradation in performance) and rationality (getting the wrong label is not better than getting no label at all) conditions that widely (and sometimes provably) hold across many standard architectures.
We complement this result with an empirical study, demonstrating that our bound is non-vacuous for many popular representation-learning based classifiers on CIFAR-10 and ImageNet, including SimCLR, AMDIM and BigBiGAN.
The talk will not assume any specific background in machine learning, and should be accessible to a general mathematical audience. Joint work with Gal Kaplun.
11/4/2020Florent Krzakala, EPFL

Title: Some exactly solvable models for machine learning via Statistical physics

Abstract: The increasing dimensionality of data in the modern machine learning age presents new challenges and opportunities.  The high dimensional settings allow one to use powerful asymptotic methods from probability theory and statistical physics to obtain precise characterizations and develop new algorithmic approaches. Statistical mechanics approaches, in particular, are very well suited for such problems. Will give examples of recent works in our group that build on powerful methods of statistical physics of disordered systems to analyze some relevant questions in machine learning and neural networks, including overparameterization, kernel methods, and the behavior gradient descent algorithm in a high dimensional non-convex landscape.
11/11/2020Eric Mjolsness, Departments of Computer Science and Mathematics, UC Irvine

Title: Towards AI for mathematical modeling of complex biological systems: Machine-learned model reduction, spatial graph dynamics, and symbolic mathematics

Abstract: The complexity of biological systems (among others) makes demands on the complexity of the mathematical modeling enterprise that could be satisfied with mathematical artificially intelligence of both symbolic and numerical flavors. Technologies that I think will be fruitful in this regard include (1) the use of machine learning to bridge spatiotemporal scales, which I will illustrate with the “Dynamic Boltzmann Distribution” method for learning model reduction of stochastic spatial biochemical networks and the “Graph Prolongation Convolutional Network” approach to course-graining the biophysics of microtubules; (2) a meta-language for stochastic spatial graph dynamics, “Dynamical Graph Grammars”, that can represent structure-changing processes including microtubule dynamics and that has an underlying combinatorial theory related to operator algebras; and (3) an integrative conceptual architecture of typed symbolic modeling languages and structure-preserving maps between them, including model reduction and implementation maps.
11/18/2020Yang-Hui He, Oxford University, City University of London and Nankai University

Title: Universes as Big data, or Machine-Learning Mathematical Structures

Abstract: We review how historically the problem of string phenomenology lead theoretical physics first to algebraic/differetial geometry, and then to computational geometry, and now to data science and AI.

With the concrete playground of the Calabi-Yau landscape, accumulated by the collaboration of physicists, mathematicians and computer scientists over the last 4 decades, we show how the latest techniques in machine-learning can help explore problems of physical and mathematical interest, from geometry, to group theory, to combinatorics and number theory.
12/9/2020James Gray, Virginia Tech, Dept. of PhysicsTopic:  Machine learning and SU(3) structures on six manifolds

Abstract: In this talk we will discuss the application of Machine Learning techniques to obtain numerical approximations to various metrics of SU(3) structure on six manifolds. More precisely, we will be interested in SU(3) structures whose torsion classes make them suitable backgrounds for various string compactifications. A variety of aspects of this topic will be covered. These will include learning moduli dependent Ricci-Flat metrics on Calabi-Yau threefolds and obtaining numerical approximations to torsional SU(3) structures.

Related Posts