New Technologies in Mathematics Seminar
Speaker: Sean Welleck, CMU, Language Technologies Institute
Title: Llemma: an open language model for mathematics
Abstract: We present Llemma: 7 billion and 34 billion parameter language models for mathematics. The Llemma models are initialized with Code Llama weights, then trained on the Proof-Pile II, a 55 billion token dataset of mathematical web data, code, and scientific papers. The resulting models show improved mathematical capabilities, and can be adapted to various tasks. For instance, Llemma outperforms the unreleased Minerva model suite on an equi-parameter basis, and is capable of tool use and formal theorem proving without any further fine-tuning. We openly release all artifacts, including the Llemma models, the Proof-Pile II, and code to replicate our experiments. We hope that Llemma serves as a platform for new research and tools at the intersection of generative models and mathematics.