|
Speaker: Jared Kaplan, Johns Hopkins Dept. of Physics & AstronomyTitle: Scaling Laws and Their Implications for Coding AIVenue: VirtualAbstract: Scaling laws and associated downstream trends can be used as an organizing principle when thinking about current and future ML progress. I will briefly review scaling laws for generative models in a number of domains, emphasizing language modeling. Then I will discuss scaling results for transfer from natural language to code, and results on python programming performance from “codex” and other models. If there’s time I’ll discuss prospects for the future — limitations from dataset sizes, and prospects for RL and other techniques. |
|
Speaker: James Bonifacio, Cambridge DAMTPTitle: Bootstrapping hyperbolic manifoldsVenue: VirtualAbstract: Hyperbolic manifolds are a class of Riemannian manifolds that are important in mathematics and physics, playing a prominent role in topology, number theory, and string theory. Associated with a given hyperbolic metric is a sequence of numbers corresponding to the discrete eigenvalues of the Laplace-Beltrami operator. While these eigenvalues usually cannot be calculated exactly, they can be found numerically and must also satisfy various bounds. In this talk, I will discuss a new approach for finding numerical bounds on the eigenvalues of closed hyperbolic manifolds using general consistency conditions and semidefinite programming, inspired by the approach of the conformal bootstrap from physics. Although these bootstrap bounds follow from seemingly trivial consistency conditions, they are surprisingly strong and… |
|
Speaker: Ben Edelman, Harvard Computer ScienceTitle: Toward Demystifying Transformers and AttentionVenue: VirtualAbstract: Over the past several years, attention mechanisms (primarily in the form of the Transformer architecture) have revolutionized deep learning, leading to advances in natural language processing, computer vision, code synthesis, protein structure prediction, and beyond. Attention has a remarkable ability to enable the learning of long-range dependencies in diverse modalities of data. And yet, there is at present limited principled understanding of the reasons for its success. In this talk, I’ll explain how attention mechanisms and Transformers work, and then I’ll share the results of a preliminary investigation into why they work so well. In particular, I’ll discuss an inductive bias of attention that we call sparse variable creation: bounded-norm Transformer layers are capable of representing sparse Boolean functions, with statistical generalization guarantees akin to… |
|
Speaker: Michael Bronstein, University of Oxford and TwitterTitle: Neural diffusion PDEs, differential geometry, and graph neural networksVenue: VirtualAbstract: In this talk, I will make connections between Graph Neural Networks (GNNs) and non-Euclidean diffusion equations. I will show that drawing on methods from the domain of differential geometry, it is possible to provide a principled view on such GNN architectural choices as positional encoding and graph rewiring as well as explain and remedy the phenomena of oversquashing and bottlenecks. |
|
Speaker: Alex Davies, DeepMindTitle: Machine learning with mathematiciansVenue: VirtualAbstract: Can machine learning be a useful tool for research mathematicians? There are many examples of mathematicians pioneering new technologies to aid our understanding of the mathematical world: using very early computers to help formulate the Birch and Swinnerton-Dyer conjecture and using computer aid to prove the four colour theorem are among the most notable. Up until now there hasn’t been significant use of machine learning in the field and it hasn’t been clear where it might be useful for the questions that mathematicians care about. In this talk we will discuss the results of our recent Nature paper, where we worked together with top mathematicians to use machine learning to achieve two new results – proving a… |
|
Speaker: Anurag Anshu, Department of EECS & Challenge Institute for Quantum Computation, UC BerkeleyTitle: Unreasonable effectiveness of the quantum complexity view on quantum many-body physicsVenue: VirtualAbstract: A central challenge in quantum many-body physics is to estimate the properties of natural quantum states, such as the quantum ground states and Gibbs states. Quantum Hamiltonian complexity offers a computational perspective on this challenge and classifies these natural quantum states using the language of quantum complexity classes. This talk will provide a gentle introduction to the field and highlight its success in pinning down the hardness of a wide variety of quantum states. In particular, we will consider the gapped ground states and Gibbs states on low dimensional lattices, which are believed to exhibit ‘low complexity’ due to the widely studied area law behaviour. Here, we will see the crucial role of complexity-theoretic methods in progress on… |
|
Speaker: Piotr Nawrot, University of WarsawTitle: Hierarchical Transformers are More Efficient Language ModelsVenue: VirtualAbstract: Transformer models yield impressive results on many NLP and sequence modeling tasks. Remarkably, Transformers can handle long sequences which allows them to produce long coherent outputs: full paragraphs produced by GPT-3 or well-structured images produced by DALL-E. These large language models are impressive but also very inefficient and costly, which limits their applications and accessibility. We postulate that having an explicit hierarchical architecture is the key to Transformers that efficiently handle long sequences. To verify this claim, we first study different ways to upsample and downsample activations in Transformers so as to make them hierarchical. We use the best performing upsampling and downsampling layers to create Hourglass – a hierarchical Transformer language model. Hourglass improves upon the Transformer… |
|
Speaker: Dan Roberts, MIT & SalesforceTitle: The Principles of Deep Learning TheoryVenue: VirtualAbstract: Deep learning is an exciting approach to modern artificial intelligence based on artificial neural networks. The goal of this talk is to provide a blueprint — using tools from physics — for theoretically analyzing deep neural networks of practical relevance. This task will encompass both understanding the statistics of initialized deep networks and determining the training dynamics of such an ensemble when learning from data. In terms of their “microscopic” definition, deep neural networks are a flexible set of functions built out of many basic computational blocks called neurons, with many neurons in parallel organized into sequential layers. Borrowing from the effective theory framework, we will develop a perturbative 1/n expansion around the limit of an infinite number… |
|
Speaker: Curtis Bright, School of Computer Science, University of Windsor and Vijay Ganesh, Dept. of Electrical and Computer Engineering, University of WaterlooTitle: When Computer Algebra Meets Satisfiability: A New Approach to Combinatorial MathematicsVenue: VirtualAbstract: Solvers for the Boolean satisfiability (SAT) problem have been increasingly used to resolve problems in mathematics due to their excellent search algorithms. This talk will describe a new method for mathematical search that couples SAT solvers with computer algebra systems (CAS), thereby combining the expressiveness of CASs with the search power of SAT solvers. This paradigm has led to a number of results on long-standing mathematical questions such as the first computer-verifiable resolution of Lam’s problem and the discovery of a new infinite class of Williamson matrices. |
|
Speaker: Patrick Massot, Laboratoire de Mathématiques d’Orsay and CNRSTitle: Why explain mathematics to computers?Venue: VirtualSpeaker: Patrick Massot, Laboratoire de Mathématiques d’Orsay and CNRS Title: Why explain mathematics to computers? Abstract: A growing number of mathematicians are having fun explaining mathematics to computers using proof assistant softwares. This process is called formalization. In this talk I’ll describe what formalization looks like, what kind of things it teaches us, and how it could even turn out to be useful (in our usual sense of “useful”). This will not be a talk about foundations of mathematics, and I won’t assume any prior knowledge about formalization. |
|
Speaker: Marijn Heule, Carnegie Mellon UniversityTitle: Computer-Aided Mathematics and SatisfiabilityVenue: VirtualAbstract: Progress in satisfiability (SAT) solving has made it possible to determine the correctness of complex systems and answer long-standing open questions in mathematics. The SAT solving approach is completely automatic and can produce clever though potentially gigantic proofs. We can have confidence in the correctness of the answers because highly trustworthy systems can validate the underlying proofs regardless of their size. We demonstrate the effectiveness of the SAT approach by presenting some recent successes, including the solution of the Boolean Pythagorean Triples problem, computing the fifth Schur number, and resolving the remaining case of Keller’s conjecture. Moreover, we constructed and validated a proof for each of these results. The second part of the talk focuses on notorious… |
|
Speaker: Thomas Fischbacher, GoogleTitle: New results in Supergravity via ML TechnologyVenue: VirtualAbstract: The infrastructure built to power the Machine Learning revolution has many other uses beyond Deep Learning. Starting from a general architecture-level overview over the lower levels of Google’s TensorFlow machine learning library, we review how this has recently helped us to find all the stable vacua of SO(8) Supergravity in 3+1 dimensions, has allowed major progress on other related questions about M theory, and briefly discuss other applications in field theory and beyond. |
|
Speaker: Adam Wagner, Tel Aviv UniversityTitle: Constructions in combinatorics via neural networksVenue: VirtualAbstract: Recently, significant progress has been made in the area of machine learning algorithms, and they have quickly become some of the most exciting tools in a scientist’s toolbox. In particular, recent advances in the field of reinforcement learning have led computers to reach superhuman level play in Atari games and Go, purely through self-play. In this talk I will give a very basic introduction to neural networks and reinforcement learning algorithms. I will also indicate how these methods can be adapted to the ““game” of trying to find a counterexample to a mathematical conjecture, and show some examples where this approach was successful. |
|
Speaker: Francois Chollet, GoogleTitle: Why abstraction is the key to intelligence, and what we’re still missingVenue: VirtualAbstract: This talk provides a personal perspective on the way forward towards more human-like and more intelligent artificial systems. Traditionally, symbolic and probabilistic methods have dominated the domains of concept formation, abstraction, and automated reasoning. More recently, deep learning-based approaches have led to significant breakthroughs, including successes in games and combinatorial search tasks. However, the resulting systems are still limited in scope and capabilities — they remain brittle, data-hungry, and their generalization capabilities are limited. We will address a set of questions: why is conceptual abstraction essential for intelligence? What is the nature of abstraction, and its relationship to generalization? What kind of abstraction can deep learning models generate, and where do they fail? What are the methods… |
|
Speaker: JM Landsberg, Texas A&MTitle: The complexity of matrix multiplication approached via algebraic geometry and representation theory.Venue: VirtualAbstract: In 1968 V. Strassen discovered the way we usually multiply matrices is not the most efficient possible, and after considerable work by many authors, it is generally conjectured by computer scientists that as the size of matrices becomes large, it becomes almost as easy to multiply them as it is to add them. I will give a brief history of the problem, explain how this conjecture is naturally understood in the framework of classical algebraic geometry and representation theory, and conclude by describing recent advances using more sophisticated tools from algebraic geometry. For most of the talk, no knowledge of algebraic geometry or representation theory will be needed. |
|
|
|
|
|