
Speaker: Katherine Collins and Albert JiangTitle: Evaluating Language Models for Mathematics through InteractionsVenue: CMSA Room G10New Technologies in Mathematics Seminar Speakers: Katherine Collins and Albert Jiang, Department of Computer Science and Technology, University of Cambridge Title: Evaluating Language Models for Mathematics through Interactions Abstract: There is much excitement about the opportunity to harness the power of large language models (LLMs) when building problemsolving assistants. However, the standard methodology of evaluating LLMs based on static pairs of inputs and outputs is insufficient to be able to make an informed decision about which LLMs, and under what assistive settings they can be sensibly utilised. Static assessment fails to take into account the essential interactive element in their deployment, and therefore limits how we understand language model capabilities. In this talk, we present our recent work introducing CheckMate, an… 

Speaker: Abhishek PanigrahiTitle: On the Power of Forward pass through Transformer ArchitecturesVenue: CMSA Room G10New Technologies in Mathematics Seminar Speaker: Abhishek Panigrahi, Dept. of Computer Science, Princeton University Title: On the Power of Forward pass through Transformer Architectures Abstract: Highly trained transformers are capable of interesting computations as they infer for an input. The exact mechanism that these models use during forward passes is an interesting area of study. This talk studies two interesting phenomena. In the first half, we explore how and why pretrained language models, specifically BERT of moderate sizes, can effectively learn linguistic structures like parse trees during pretraining. Specifically, using synthetic data through PCFGs, we show how moderatesized transformers can perform forwardbackward parsing, also known as the insideoutside algorithm, during inference. We further understand the role of… 

Speaker: Gabriel PoesiaTitle: Peano: Learning Formal Mathematical Reasoning Without Human DataVenue: CMSA Room G10New Technologies in Mathematics Seminar Speaker: Gabriel Poesia, Dept. of Computer Science, Stanford University Title: Peano: Learning Formal Mathematical Reasoning Without Human Data Abstract: Peano is a theorem proving environment in which a computational agent can start tabula rasa in a new domain, learn to solve problems through curiositydriven exploration, and create its own higher level actions. Gabriel will describe the system, present case studies on learning to solve simple algebra problems from the Khan Academy platform, and describe work on progress on learning the Natural Number Game, a popular introduction to theorem proving in Lean for mathematicians. 

Speaker: Sean WelleckTitle: Llemma: an open language model for mathematicsVenue: CMSA Room G10New Technologies in Mathematics Seminar Speaker: Sean Welleck, CMU, Language Technologies Institute Title: Llemma: an open language model for mathematics Abstract: We present Llemma: 7 billion and 34 billion parameter language models for mathematics. The Llemma models are initialized with Code Llama weights, then trained on the ProofPile II, a 55 billion token dataset of mathematical web data, code, and scientific papers. The resulting models show improved mathematical capabilities, and can be adapted to various tasks. For instance, Llemma outperforms the unreleased Minerva model suite on an equiparameter basis, and is capable of tool use and formal theorem proving without any further finetuning. We openly release all artifacts, including the Llemma models, the ProofPile II, and code to… 

Speaker: Yuanzhi LiTitle: Physics of Language Models: Knowledge Storage, Extraction, and ManipulationVenue: CMSA Room G10New Technologies in Mathematics Seminar Speaker: Yuanzhi Li, CMU Dept. of Machine Learning and Microsoft Research Title: Physics of Language Models: Knowledge Storage, Extraction, and Manipulation Abstract: Large language models (LLMs) can memorize a massive amount of knowledge during pretraining, but can they effectively use this knowledge at inference time? In this work, we show several striking results about this question. Using a synthetic biography dataset, we first show that even if an LLM achieves zero training loss when pretraining on the biography dataset, it sometimes can not be finetuned to answer questions as simple as “What is the birthday of XXX” at all. We show that sufficient data augmentation during pretraining, such as rewriting the same biography multiple times or… 

Speaker: Alex GuTitle: LeanDojo: Theorem Proving with RetrievalAugmented Language ModelsVenue: CMSA Room G10New Technologies in Mathematics Seminar Speaker: Alex Gu, MIT Dept. of EE&CS Title: LeanDojo: Theorem Proving with RetrievalAugmented Language Models Abstract: Large language models (LLMs) have shown promise in proving formal theorems using proof assistants such as Lean. However, existing methods are difficult to reproduce or build on, due to private code, data, and large compute requirements. This has created substantial barriers to research on machine learning methods for theorem proving. We introduce LeanDojo: an opensource Lean playground consisting of toolkits, data, models, and benchmarks. LeanDojo extracts data from Lean and enables interaction with the proof environment programmatically. It contains finegrained annotations of premises in proofs, providing valuable data for premise selection: a key bottleneck in theorem proving. Using… 

Speaker: François ChartonTitle: Transformers for maths, and maths for transformersVenue: CMSA Room G10New Technologies in Mathematics Seminar Speaker: François Charton, Meta AI Title: Transformers for maths, and maths for transformers Abstract: Transformers can be trained to solve problems of mathematics. I present two recent applications, in mathematics and physics: predicting integer sequences, and discovering the properties of scattering amplitudes in a close relative of Quantum ChromoDynamics. Problems of mathematics can also help understand transformers. Using two examples from linear algebra and integer arithmetic, I show that model predictions can be explained, that trained models do not confabulate, and that carefully choosing the training distributions can help achieve better, and more robust, performance. 

Speaker: Ronen EldanTitle: The TinyStories Dataset: How Small Can Language Models Be And Still Speak CoherentVenue: virtualNew Technologies in Mathematics Seminar Speaker: Ronen Eldan, Microsoft Research Title: The TinyStories Dataset: How Small Can Language Models Be And Still Speak Coherent Abstract: While generative language models exhibit powerful capabilities at large scale, when either the model or the number of training steps is too small, they struggle to produce coherent and fluent text: Existing models whose size is below a few billion parameters often do not generate coherent text beyond a few sentences. Hypothesizing that one of the main reasons for the strong reliance on size is the vast breadth and abundance of patterns in the datasets used to train those models, this motivates the following question: Can we design a dataset that preserves the… 

Speaker: Dmitry KrotovTitle: Modern Hopfield Networks for Novel Transformer ArchitecturesVenue: virtualNew Technologies in Mathematics Seminar Speaker: Dmitry Krotov, IBM Research – Cambridge Title: Modern Hopfield Networks for Novel Transformer Architectures Abstract: Modern Hopfield Networks or Dense Associative Memories are recurrent neural networks with fixed point attractor states that are described by an energy function. In contrast to conventional Hopfield Networks, which were popular in the 1980s, their modern versions have a very large memory storage capacity, which makes them appealing tools for many problems in machine learning and cognitive and neurosciences. In this talk, I will introduce an intuition and a mathematical formulation of this class of models and will give examples of problems in AI that can be tackled using these new ideas. Particularly, I will… 

Speaker: Timo SchickTitle: Toolformer: Language Models Can Teach Themselves to Use ToolsVenue: virtualNew Technologies in Mathematics Seminar Speaker: Timo Schick, Meta AI Title: Toolformer: Language Models Can Teach Themselves to Use Tools Abstract: Language models exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup, where much simpler and smaller models excel. In this talk, we show how these limitations can be overcome by letting language models teach themselves to use external tools via simple APIs. We discuss Toolformer, a model trained to independently decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. Through this, it achieves substantially… 

Speaker: Jimmy BaTitle: How to steer foundation models?Venue: CMSA Room G10New Technologies in Mathematics Seminar Speaker: Jimmy Ba, University of Toronto Title: How to steer foundation models? Abstract: By conditioning on natural language instructions, foundation models and large language models (LLMs) have displayed impressive capabilities as generalpurpose computers. However, task performance depends significantly on the quality of the prompt used to steer the model. Due to the lack of knowledge of how foundation models work, most effective prompts have been handcrafted by humans through a demanding trialanderror process. To reduce the human effort in this alignment process, I will discuss a few approaches to steer these powerful models to excel in various downstream language and image tasks. 

Speaker: Cyril ZhangTitle: How do Transformers reason? First principles via automata, semigroups, and circuitsVenue: CMSA Room G10New Technologies in Mathematics Seminar Speaker: Cyril Zhang, Microsoft Research Title: How do Transformers reason? First principles via automata, semigroups, and circuits Abstract: The current “Transformer era” of deep learning is marked by the emergence of combinatorial and algorithmic reasoning capabilities in large sequence models, leading to dramatic advances in natural language understanding, program synthesis, and theorem proving. What is the nature of these models’ internal representations (i.e. how do they represent the states and computational steps of the algorithms they execute)? How can we understand and mitigate their weaknesses, given that they resist interpretation? In this work, we present some insights (and many further mysteries) through the lens of automata and their algebraic structure. Specifically, we investigate the apparent… 

Speaker: João Araújo, Mathematics Department, Universidade Nova de Lisboa and Michael Kinyon, Department of Mathematics, University of DenverTitle: From Engine to AutoVenue: CMSA Room G10New Technologies in Mathematics Seminar Speakers: João Araújo, Mathematics Department, Universidade Nova de Lisboa and Michael Kinyon, Department of Mathematics, University of Denver Title: From Engine to Auto Abstract: Bill McCune produced the program EQP that deals with first order logic formulas and in 1996 managed to solve Robbins’ Conjecture. This very powerful tool reduces to triviality any result that can be obtained by encoding the assumptions and the goals. The next step was to turn the program into a genuine assistant for the working mathematician: find ways to help the prover with proofs; reduce the lengths of the automatic proofs to better crack them; solve problems in higher order logic; devise tools that autonomously prove results of… 

Speaker: Antonia CreswellTitle: Towards Faithful Reasoning Using Language ModelsVenue: CMSA Room G10New Technologies in Mathematics Seminar Speaker: Antonia Creswell, DeepMind Title: Towards Faithful Reasoning Using Language Models Abstract: Language models are showing impressive performance on many natural language tasks, including questionanswering. However, language models – like most deep learning models – are black boxes. We cannot be sure how they obtain their answers. Do they reason over relevant knowledge to construct an answer or do they rely on prior knowledge – baked into their weights – which may be biased? An alternative approach is to develop models whose output is a human interpretable, faithful reasoning trace leading to an answer. In this talk we will characterise faithful reasoning in terms of logically valid reasoning and demonstrate where current reasoning… 

Speaker: Guy GurAriTitle: Minerva: Solving Quantitative Reasoning Problems with Language ModelsVenue: CMSA Room G10New Technologies in Mathematics Seminar Speaker: Guy GurAri, Google Research Title: Minerva: Solving Quantitative Reasoning Problems with Language Models Abstract: Quantitative reasoning tasks which can involve mathematics, science, and programming are often challenging for machine learning models in general and for language models in particular. We show that transformerbased language models obtain significantly better performance on math and science questions when trained in an unsupervised way on a large, mathfocused dataset. Performance can be further improved using prompting and sampling techniques including chainofthought and majority voting. Minerva, a model that combines these techniques, achieves SOTA on several math and science benchmarks. I will describe the model, its capabilities and limitations. 

Speaker: Surya GanguliTitle: Statistical mechanics of neural networks: From the geometry of high dimensional error landscapes to beating power law neural scalingVenue: CMSA Room G10New Technologies in Mathematics Speaker: Surya Ganguli, Stanford University Title: Statistical mechanics of neural networks: From the geometry of high dimensional error landscapes to beating power law neural scaling Abstract: Statistical mechanics and neural network theory have long enjoyed fruitful interactions. We will review some of our recent work in this area and then focus on two vignettes. First we will analyze the high dimensional geometry of neural network error landscapes that happen to arise as the classical limit of a dissipative manybody quantum optimizer. In particular, we will be able to use the KacRice formula and the replica method to calculate the number, location, energy levels, and Hessian eigenspectra of all critical points of any index. Second we will review… 

Speaker: Johan CommelinTitle: Breaking the onemindbarrier in mathematics using formal verificationVenue: CMSA Room G10New Technologies in Mathematics Seminar Speaker: Johan Commelin, Mathematisches Institut, AlbertLudwigsUniversität Freiburg Title: Breaking the onemindbarrier in mathematics using formal verification Abstract: In this talk I will argue that formal verification helps break the onemindbarrier in mathematics. Indeed, formal verification allows a team of mathematicians to collaborate on a project, without one person understanding all parts of the project. At the same time, it also allows a mathematician to rapidly free mental RAM in order to work on a different component of a project. It thus also expands the onemindbarrier. I will use the Liquid Tensor Experiment as an example, to illustrate the above two points. This project recently finished the formalization of the main theorem of liquid vector spaces, following up on a challenge by… 

Speaker: Yuhuai Wu, Stanford and GoogleTitle: Memorizing TransformersVenue: VirtualAbstract: Language models typically need to be trained or finetuned in order to acquire new knowledge, which involves updating their weights. We instead envision language models that can simply read and memorize new data at inference time, thus acquiring new knowledge immediately. In this talk, I will discuss how we extend language models with the ability to memorize the internal representations of past inputs. We demonstrate that an approximate NN lookup into a nondifferentiable memory of recent (key, value) pairs improves language modeling across various benchmarks and tasks, including generic webtext (C4), math papers (arXiv), books (PG19), code (Github), as well as formal theorems (Isabelle). We show that the performance steadily improves when we increase the size of memory… 

Speaker: Stanislas Polu, OpenAITitle: Formal Mathematics Statement Curriculum LearningVenue: VirtualAbstract: We explore the use of expert iteration in the context of language modeling applied to formal mathematics. We show that at same compute budget, expert iteration, by which we mean proof search interleaved with learning, dramatically outperforms proof search only. We also observe that when applied to a collection of formal statements of sufficiently varied difficulty, expert iteration is capable of finding and solving a curriculum of increasingly difficult problems, without the need for associated groundtruth proofs. Finally, by applying this expert iteration to a manually curated set of problem statements, we achieve stateoftheart on the miniF2F benchmark, automatically solving multiple challenging problems drawn from high school olympiads. 

Speaker: Iddo Drori, MIT EE&CS and Columbia School of EngineeringTitle: Machine Learning 30 STEM Courses in 12 DepartmentsVenue: VirtualAbstract: We automatically solve, explain, and generate universitylevel course problems from thirty STEM courses (at MIT, Harvard, and Columbia) for the first time. We curate a new dataset of course questions and answers across a dozen departments: Aeronautics and Astronautics, Chemical Engineering, Chemistry, Computer Science, Economics, Electrical Engineering, Materials Science, Mathematics, Mechanical Engineering, Nuclear Science, Physics, and Statistics. We generate new questions and use them in a Columbia University course, and perform A/B tests demonstrating that these machine generated questions are indistinguishable from humanwritten questions and that machine generated explanations are as useful as humanwritten explanations, again for the first time. Our approach consists of five steps: (i) Given course questions, turn them into programming tasks; (ii) Automatically generate… 