
Speaker: Cyril ZhangTitle: How do Transformers reason? First principles via automata, semigroups, and circuitsVenue: CMSA Room G10New Technologies in Mathematics Seminar Speaker: Cyril Zhang, Microsoft Research Title: How do Transformers reason? First principles via automata, semigroups, and circuits Abstract: The current “Transformer era” of deep learning is marked by the emergence of combinatorial and algorithmic reasoning capabilities in large sequence models, leading to dramatic advances in natural language understanding, program synthesis, and theorem proving. What is the nature of these models’ internal representations (i.e. how do they represent the states and computational steps of the algorithms they execute)? How can we understand and mitigate their weaknesses, given that they resist interpretation? In this work, we present some insights (and many further mysteries) through the lens of automata and their algebraic structure. Specifically, we investigate the apparent mismatch… 

Speaker: João Araújo, Mathematics Department, Universidade Nova de Lisboa and Michael Kinyon, Department of Mathematics, University of DenverTitle: From Engine to AutoVenue: CMSA Room G10New Technologies in Mathematics Seminar Speakers: João Araújo, Mathematics Department, Universidade Nova de Lisboa and Michael Kinyon, Department of Mathematics, University of Denver Title: From Engine to Auto Abstract: Bill McCune produced the program EQP that deals with first order logic formulas and in 1996 managed to solve Robbins’ Conjecture. This very powerful tool reduces to triviality any result that can be obtained by encoding the assumptions and the goals. The next step was to turn the program into a genuine assistant for the working mathematician: find ways to help the prover with proofs; reduce the lengths of the automatic proofs to better crack them; solve problems in higher order logic; devise tools that autonomously prove results of… 

Speaker: Antonia CreswellTitle: Towards Faithful Reasoning Using Language ModelsVenue: CMSA Room G10New Technologies in Mathematics Seminar Speaker: Antonia Creswell, DeepMind Title: Towards Faithful Reasoning Using Language Models Abstract: Language models are showing impressive performance on many natural language tasks, including questionanswering. However, language models – like most deep learning models – are black boxes. We cannot be sure how they obtain their answers. Do they reason over relevant knowledge to construct an answer or do they rely on prior knowledge – baked into their weights – which may be biased? An alternative approach is to develop models whose output is a human interpretable, faithful reasoning trace leading to an answer. In this talk we will characterise faithful reasoning in terms of logically valid reasoning and demonstrate where current reasoning… 

Speaker: Guy GurAriTitle: Minerva: Solving Quantitative Reasoning Problems with Language ModelsVenue: CMSA Room G10New Technologies in Mathematics Seminar Speaker: Guy GurAri, Google Research Title: Minerva: Solving Quantitative Reasoning Problems with Language Models Abstract: Quantitative reasoning tasks which can involve mathematics, science, and programming are often challenging for machine learning models in general and for language models in particular. We show that transformerbased language models obtain significantly better performance on math and science questions when trained in an unsupervised way on a large, mathfocused dataset. Performance can be further improved using prompting and sampling techniques including chainofthought and majority voting. Minerva, a model that combines these techniques, achieves SOTA on several math and science benchmarks. I will describe the model, its capabilities and limitations. 

Speaker: Surya GanguliTitle: Statistical mechanics of neural networks: From the geometry of high dimensional error landscapes to beating power law neural scalingVenue: CMSA Room G10New Technologies in Mathematics Speaker: Surya Ganguli, Stanford University Title: Statistical mechanics of neural networks: From the geometry of high dimensional error landscapes to beating power law neural scaling Abstract: Statistical mechanics and neural network theory have long enjoyed fruitful interactions. We will review some of our recent work in this area and then focus on two vignettes. First we will analyze the high dimensional geometry of neural network error landscapes that happen to arise as the classical limit of a dissipative manybody quantum optimizer. In particular, we will be able to use the KacRice formula and the replica method to calculate the number, location, energy levels, and Hessian eigenspectra of all critical points of any index. Second we will review… 

Speaker: Johan CommelinTitle: Breaking the onemindbarrier in mathematics using formal verificationVenue: CMSA Room G10New Technologies in Mathematics Seminar Speaker: Johan Commelin, Mathematisches Institut, AlbertLudwigsUniversität Freiburg Title: Breaking the onemindbarrier in mathematics using formal verification Abstract: In this talk I will argue that formal verification helps break the onemindbarrier in mathematics. Indeed, formal verification allows a team of mathematicians to collaborate on a project, without one person understanding all parts of the project. At the same time, it also allows a mathematician to rapidly free mental RAM in order to work on a different component of a project. It thus also expands the onemindbarrier. I will use the Liquid Tensor Experiment as an example, to illustrate the above two points. This project recently finished the formalization of the main theorem of liquid vector spaces, following up on a challenge by… 

Speaker: Yuhuai Wu, Stanford and GoogleTitle: Memorizing TransformersVenue: VirtualAbstract: Language models typically need to be trained or finetuned in order to acquire new knowledge, which involves updating their weights. We instead envision language models that can simply read and memorize new data at inference time, thus acquiring new knowledge immediately. In this talk, I will discuss how we extend language models with the ability to memorize the internal representations of past inputs. We demonstrate that an approximate NN lookup into a nondifferentiable memory of recent (key, value) pairs improves language modeling across various benchmarks and tasks, including generic webtext (C4), math papers (arXiv), books (PG19), code (Github), as well as formal theorems (Isabelle). We show that the performance steadily improves when we increase the size of memory… 

Speaker: Stanislas Polu, OpenAITitle: Formal Mathematics Statement Curriculum LearningVenue: VirtualAbstract: We explore the use of expert iteration in the context of language modeling applied to formal mathematics. We show that at same compute budget, expert iteration, by which we mean proof search interleaved with learning, dramatically outperforms proof search only. We also observe that when applied to a collection of formal statements of sufficiently varied difficulty, expert iteration is capable of finding and solving a curriculum of increasingly difficult problems, without the need for associated groundtruth proofs. Finally, by applying this expert iteration to a manually curated set of problem statements, we achieve stateoftheart on the miniF2F benchmark, automatically solving multiple challenging problems drawn from high school olympiads. 

Speaker: Iddo Drori, MIT EE&CS and Columbia School of EngineeringTitle: Machine Learning STEM Courses in DepartmentsVenue: VirtualAbstract: We automatically solve, explain, and generate universitylevel course problems from thirty STEM courses (at MIT, Harvard, and Columbia) for the first time. We curate a new dataset of course questions and answers across a dozen departments: Aeronautics and Astronautics, Chemical Engineering, Chemistry, Computer Science, Economics, Electrical Engineering, Materials Science, Mathematics, Mechanical Engineering, Nuclear Science, Physics, and Statistics. We generate new questions and use them in a Columbia University course, and perform A/B tests demonstrating that these machine generated questions are indistinguishable from humanwritten questions and that machine generated explanations are as useful as humanwritten explanations, again for the first time. Our approach consists of five steps: (i) Given course questions, turn them into programming tasks; (ii) Automatically generate… 

Speaker: Jared Kaplan, Johns Hopkins Dept. of Physics & AstronomyTitle: Scaling Laws and Their Implications for Coding AIVenue: VirtualAbstract: Scaling laws and associated downstream trends can be used as an organizing principle when thinking about current and future ML progress. I will briefly review scaling laws for generative models in a number of domains, emphasizing language modeling. Then I will discuss scaling results for transfer from natural language to code, and results on python programming performance from “codex” and other models. If there’s time I’ll discuss prospects for the future — limitations from dataset sizes, and prospects for RL and other techniques. 

Speaker: James Bonifacio, Cambridge DAMTPTitle: Bootstrapping hyperbolic manifoldsVenue: VirtualAbstract: Hyperbolic manifolds are a class of Riemannian manifolds that are important in mathematics and physics, playing a prominent role in topology, number theory, and string theory. Associated with a given hyperbolic metric is a sequence of numbers corresponding to the discrete eigenvalues of the LaplaceBeltrami operator. While these eigenvalues usually cannot be calculated exactly, they can be found numerically and must also satisfy various bounds. In this talk, I will discuss a new approach for finding numerical bounds on the eigenvalues of closed hyperbolic manifolds using general consistency conditions and semidefinite programming, inspired by the approach of the conformal bootstrap from physics. Although these bootstrap bounds follow from seemingly trivial consistency conditions, they are surprisingly strong and… 

Speaker: Ben Edelman, Harvard Computer ScienceTitle: Toward Demystifying Transformers and AttentionVenue: VirtualAbstract: Over the past several years, attention mechanisms (primarily in the form of the Transformer architecture) have revolutionized deep learning, leading to advances in natural language processing, computer vision, code synthesis, protein structure prediction, and beyond. Attention has a remarkable ability to enable the learning of longrange dependencies in diverse modalities of data. And yet, there is at present limited principled understanding of the reasons for its success. In this talk, I’ll explain how attention mechanisms and Transformers work, and then I’ll share the results of a preliminary investigation into why they work so well. In particular, I’ll discuss an inductive bias of attention that we call sparse variable creation: boundednorm Transformer layers are capable of representing sparse Boolean functions, with statistical generalization guarantees akin to… 

Speaker: Michael Bronstein, University of Oxford and TwitterTitle: Neural diffusion PDEs, differential geometry, and graph neural networksVenue: VirtualAbstract: In this talk, I will make connections between Graph Neural Networks (GNNs) and nonEuclidean diffusion equations. I will show that drawing on methods from the domain of differential geometry, it is possible to provide a principled view on such GNN architectural choices as positional encoding and graph rewiring as well as explain and remedy the phenomena of oversquashing and bottlenecks. 

Speaker: Alex Davies, DeepMindTitle: Machine learning with mathematiciansVenue: VirtualAbstract: Can machine learning be a useful tool for research mathematicians? There are many examples of mathematicians pioneering new technologies to aid our understanding of the mathematical world: using very early computers to help formulate the Birch and SwinnertonDyer conjecture and using computer aid to prove the four colour theorem are among the most notable. Up until now there hasn’t been significant use of machine learning in the field and it hasn’t been clear where it might be useful for the questions that mathematicians care about. In this talk we will discuss the results of our recent Nature paper, where we worked together with top mathematicians to use machine learning to achieve two new results – proving a… 

Speaker: Anurag Anshu, Department of EECS & Challenge Institute for Quantum Computation, UC BerkeleyTitle: Unreasonable effectiveness of the quantum complexity view on quantum manybody physicsVenue: VirtualAbstract: A central challenge in quantum manybody physics is to estimate the properties of natural quantum states, such as the quantum ground states and Gibbs states. Quantum Hamiltonian complexity offers a computational perspective on this challenge and classifies these natural quantum states using the language of quantum complexity classes. This talk will provide a gentle introduction to the field and highlight its success in pinning down the hardness of a wide variety of quantum states. In particular, we will consider the gapped ground states and Gibbs states on low dimensional lattices, which are believed to exhibit ‘low complexity’ due to the widely studied area law behaviour. Here, we will see the crucial role of complexitytheoretic methods in progress on… 

Speaker: Piotr Nawrot, University of WarsawTitle: Hierarchical Transformers are More Efficient Language ModelsVenue: VirtualAbstract: Transformer models yield impressive results on many NLP and sequence modeling tasks. Remarkably, Transformers can handle long sequences which allows them to produce long coherent outputs: full paragraphs produced by GPT3 or wellstructured images produced by DALLE. These large language models are impressive but also very inefficient and costly, which limits their applications and accessibility. We postulate that having an explicit hierarchical architecture is the key to Transformers that efficiently handle long sequences. To verify this claim, we first study different ways to upsample and downsample activations in Transformers so as to make them hierarchical. We use the best performing upsampling and downsampling layers to create Hourglass – a hierarchical Transformer language model. Hourglass improves upon the Transformer… 

Speaker: Dan Roberts, MIT & SalesforceTitle: The Principles of Deep Learning TheoryVenue: VirtualAbstract: Deep learning is an exciting approach to modern artificial intelligence based on artificial neural networks. The goal of this talk is to provide a blueprint — using tools from physics — for theoretically analyzing deep neural networks of practical relevance. This task will encompass both understanding the statistics of initialized deep networks and determining the training dynamics of such an ensemble when learning from data. In terms of their “microscopic” definition, deep neural networks are a flexible set of functions built out of many basic computational blocks called neurons, with many neurons in parallel organized into sequential layers. Borrowing from the effective theory framework, we will develop a perturbative 1/n expansion around the limit of an infinite number… 

Speaker: Curtis Bright, School of Computer Science, University of Windsor and Vijay Ganesh, Dept. of Electrical and Computer Engineering, University of WaterlooTitle: When Computer Algebra Meets Satisfiability: A New Approach to Combinatorial MathematicsVenue: VirtualAbstract: Solvers for the Boolean satisfiability (SAT) problem have been increasingly used to resolve problems in mathematics due to their excellent search algorithms. This talk will describe a new method for mathematical search that couples SAT solvers with computer algebra systems (CAS), thereby combining the expressiveness of CASs with the search power of SAT solvers. This paradigm has led to a number of results on longstanding mathematical questions such as the first computerverifiable resolution of Lam’s problem and the discovery of a new infinite class of Williamson matrices. 

Speaker: Patrick Massot, Laboratoire de Mathématiques d’Orsay and CNRSTitle: Why explain mathematics to computers?Venue: VirtualAbstract: A growing number of mathematicians are having fun explaining mathematics to computers using proof assistant softwares. This process is called formalization. In this talk I’ll describe what formalization looks like, what kind of things it teaches us, and how it could even turn out to be useful (in our usual sense of “useful”). This will not be a talk about foundations of mathematics, and I won’t assume any prior knowledge about formalization. 

Speaker: Marijn Heule, Carnegie Mellon UniversityTitle: ComputerAided Mathematics and SatisfiabilityVenue: VirtualAbstract: Progress in satisfiability (SAT) solving has made it possible to determine the correctness of complex systems and answer longstanding open questions in mathematics. The SAT solving approach is completely automatic and can produce clever though potentially gigantic proofs. We can have confidence in the correctness of the answers because highly trustworthy systems can validate the underlying proofs regardless of their size. We demonstrate the effectiveness of the SAT approach by presenting some recent successes, including the solution of the Boolean Pythagorean Triples problem, computing the fifth Schur number, and resolving the remaining case of Keller’s conjecture. Moreover, we constructed and validated a proof for each of these results. The second part of the talk focuses on notorious… 