First Proof, Second Batch

The First Proof, Second Batch project is a set of ten math questions designed to evaluate the capabilities of AI systems to autonomously solve problems that arise naturally in the research process.
In early June, mathematicians from various institutions will gather at the Harvard CMSA to grade the AI-generated solutions of a second batch of problems. Human mathematicians referee AI solutions blind, rating each as essentially flawless, publishable with minor revisions, requiring major revisions, or rejected. Results — including referee reports and editorial reasoning — will be published online.
These results will be presented by the CMSA at two Webinars.

June 3, 2026
Introduction to First Proof: A conversation

Watch the Webinar (link)

Time: 1:00–2:00 pm ET
Harvard CMSA Director Dan Freed will lead a dialogue with First Proof Editors Mohammed Abouzaid (Stanford), Nikhil Srivastava (UC Berkeley), Rachel Ward (UT Austin), and Lauren Williams (Harvard) to explore the origins and goals of First Proof, sample some of their current work, and discuss their future plans. This hour-long conversation will give a window into the fast-moving and deep interaction of Mathematics and AI.

June 10, 2026
First Proof, Second Batch: Results

Watch the Webinar (link)

Time: 1:00–2:00 pm ET
The First Proof Editors will present the results of their Second Batch benchmark testing on AI systems.