BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//CMSA - ECPv6.15.20//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:CMSA
X-ORIGINAL-URL:https://cmsa.fas.harvard.edu
X-WR-CALDESC:Events for CMSA
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20240310T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20241103T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20250309T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20251102T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20260308T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20261101T060000
END:STANDARD
END:VTIMEZONE
BEGIN:VTIMEZONE
TZID:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20240310T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20241103T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20250309T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20251102T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20260308T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20261101T060000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20251006T090000
DTEND;TZID=America/New_York:20251010T170000
DTSTAMP:20260527T113601
CREATED:20250502T180256Z
LAST-MODIFIED:20260422T160144Z
UID:10003747-1759741200-1760115600@cmsa.fas.harvard.edu
SUMMARY:Mathematical foundations of AI
DESCRIPTION:Mathematical foundations of AI \nDate: October 6–10\, 2025 \nLocation: Harvard CMSA\, Room G10\, 20 Garden Street\, Cambridge MA & via Zoom \nArtificial intelligence (AI) has achieved unprecedented advances\, yet our theoretical understanding lags significantly behind. This gap poses a significant obstacle to improving AI’s safety and reliability. Since the classical tools of learning theory have proven insufficient for understanding AI\, researchers are now drawing insights from a vast array of fields—including functional analysis\, probability theory\, optimal transport\, optimization\, PDEs\, information theory\, geometry\, statistics\, electrical engineering\, and ergodic theory. Those interdisciplinary efforts are gradually shedding light on the underlying principles governing modern AI. This workshop centers around these mathematical and interdisciplinary developments. It will feature a series of talks from people in various subfields. Open problem and small-group sessions will help foster new connections and new research avenues. \n  \n Speakers \n\nJason Altschuler\, University of Pennsylvania\nGuy Bresler\, MIT\nSinho Chewi\, Yale University\nLenaic Chizat\, EPFL\nNabarun Deb\, University of Chicago\nEdgar Dobriban\, University of Pennsylvania\nAhmed El Alaoui\, Cornell University\nZhou Fan\, Yale University\nBoris Hanin\, Princeton University\nJason Klusowski\, Princeton University\nTengyu Ma\, Stanford University\nAlexander Rakhlin\, MIT\nYuting Wei\, University of Pennsylvania\nTijana Zrnic\, Stanford University\n\nOrganizer: Morgane Austern\, Harvard Statistics \n  \nSchedule \nMonday\, Oct. 6\, 2025 \n\n\n\n8:30–9:00 am\nMorning refreshments\n\n\n9:00–10:00 am\nYuting Wei\, U Penn \nTo Intrinsic Dimension and Beyond: Efficient Sampling in Diffusion Models \nThe denoising diffusion probabilistic model (DDPM) has become a cornerstone of generative AI. While sharp convergence guarantees have been established for DDPM\, the iteration complexity typically scales with the ambient data dimension of target distributions\, leading to overly conservative theory that fails to explain its practical efficiency. This has sparked recent efforts to understand how DDPM can achieve sampling speed-ups through automatic exploitation of intrinsic low dimensionality of data. This talk explores two key scenarios: (1) For a broad class of data distributions with intrinsic dimension k\, we prove that the iteration complexity of the DDPM scales nearly linearly with k\, which is optimal under the KL divergence metric; (2) For mixtures of Gaussian distributions with k components\, we show that DDPM learns the distribution with iteration complexity that grows only logarithmically in k. These results provide theoretical justification for the practical efficiency of diffusion models.\n\n\n10:00–10:30 am\nBreak\n\n\n10:30–11:30 am\nJason Klusowski\, Princeton \nThe Value of Side Information in Unlabeled Data \nPractitioners often work in settings with limited labeled data and abundant unlabeled data. During training\, they may even have access to extra side information (some labeled\, some not) that won’t be available once the model is deployed. When can this side information actually improve performance? I’ll present a simple framework where a rich-view model that sees the extra features generates pseudo-labels on the large unlabeled data\, and a deployment model that only sees the standard features is trained on both real and pseudo-labels. The two are trained iteratively: each deployment model update calibrates the next round of pseudo-labels\, and those refined pseudo-labels in turn guide the deployment model. Our theory shows that side information helps precisely when the rich-view and deployment models make different kinds of errors. We formalize this with a decorrelation score that quantifies how independent those errors are; the more independent\, the greater the performance gains.\n\n\n11:3 0am–12:00 pm\nBreak\n\n\n12:00–1:00 pm\nGuy Bresler\, MIT \nGlobal Minimizers of Sigmoid Contrastive Loss \nThe meta-task of obtaining and aligning representations through contrastive pre-training is steadily gaining importance since its introduction in CLIP and ALIGN. In this paper we theoretically explain the advantages of synchronizing with trainable inverse temperature and bias under the sigmoid loss\, as implemented in the recent SigLIP models of Google DeepMind. Temperature and bias can drive the loss function to zero for a rich class of configurations that we call (m\,b)-Constellations. (m\,b)-Constellations are a novel combinatorial object related to spherical codes and are parametrized by a margin m and relative bias b. We use our characterization of constellations to theoretically justify the success of SigLIP on retrieval\, to explain the modality gap present in SigLIP\, and to identify the necessary dimension for producing high-quality representations. We also propose a reparameterization of the sigmoid loss with explicit relative bias\, which appears to improve training dynamics. Joint work with Kiril Bangachev\, Iliyas Noman\, and Yury Polyanskiy.\n\n\n\n  \nTuesday\, Oct. 7\, 2025 \n\n\n\n8:30–9:00 am\nMorning refreshments\n\n\n9:00–10:00 am\nLénaïc Chizat\, EPFL \nThe Hidden Width of Deep ResNets \nWe present a mathematical framework to analyze the training dynamics of deep ResNets that rigorously captures practical architectures (including Transformers) trained from standard random initializations. Our approach combines stochastic approximation of ODEs with propagation-of-chaos arguments. It yields three main insights:\n– Depth begets width: infinite-depth ResNets of any hidden width behave throughout training as if they were infinitely wide;\n– Unified phase diagram: the phase diagram of Transformers mirrors that of two-layer perceptrons\, once the appropriate substitutions are made;\n– Optimal shape scaling: for a given parameter budget P\, a Transformer with optimal shape converges to its limiting dynamics at rate P^{-1/6}.\nThis is based on https://arxiv.org/abs/2509.10167\n\n\n10:00–10:30 am\nBreak \n \n\n\n10:30–11:30 am\nBoris Hanin\, Princeton \nKernel Learning on Manifolds \nThis talk concerns the L_2 risk of minimum norm interpolation with n samples in the RKHS of a kernel K. Unlike most prior work in this space our kernels will be defined on any close d-dimensional Riemannian manifold\, and we require only that the kernels are trace class and elliptic. With these assumptions we get nearly sharp L_2 risk bounds with high probability over the data. Like prior work on round spheres our results essentially say that the number of samples n\, the dimension of the manifold\, and some details of the kernel determine a natural spectral cutoff \lambda(n\,d\,K) and that minimal norm interpolation essentially learns exactly the projection of the data generating process onto the eigenfunctions of the Laplacian with frequency at most \lambda(n\,d\,K). Joint work with Mengxuan Yang.\n\n\n11:30–12:00\nBreak\n\n\n12:00–1:00\nZhou Fan\, Yale \nDynamical mean-field analysis of adaptive Langevin diffusions \nIn many applications of statistical estimation via sampling\, one may wish to sample from a high-dimensional target distribution that is adaptively evolving to the samples already seen. We study an example of such dynamics\, given by a Langevin diffusion for posterior sampling in a Bayesian linear regression model with i.i.d. regression design\, whose prior continuously adapts to the Langevin trajectory via a maximum marginal-likelihood scheme. Using techniques of dynamical mean-field theory (DMFT)\, we provide a precise characterization of a high-dimensional asymptotic limit for the joint evolution of the prior parameter and law of the Langevin sample. We then carry out an analysis of the equations that describe this DMFT limit\, under conditions of approximate time-translation-invariance which include\, in particular\, settings where the posterior law satisfies a log-Sobolev inequality. In such settings\, we show that this adaptive Langevin trajectory converges on a dimension-independent time horizon to an equilibrium state that is characterized by a system of replica-symmetric fixed-point equations\, and the associated prior parameter converges to a critical point of a replica-symmetric limit for the model free energy. We explore the nature of the free energy landscape and its critical points in a few simple examples\, where such critical points may or may not be unique.\n\n\n\n  \nWednesday\, Oct. 8\, 2025 \n\n\n\n8:30–9:00 am\nMorning refreshments\n\n\n9:00–10:00 am\nJason Altschuler\, U Penn \nNegative Stepsizes Make Gradient-Descent-Ascent Converge \nSolving min-max problems is a central question in optimization\, games\, learning\, and controls. Arguably the most natural algorithm is Gradient-Descent-Ascent (GDA)\, however since the 1970s\, conventional wisdom has argued that it fails to converge even on simple problems. This failure spurred the extensive literature on modifying GDA with extragradients\, optimism\, momentum\, anchoring\, etc. In contrast\, we show that GDA converges in its original form by simply using a judicious choice of stepsizes. The key innovation is the proposal of unconventional stepsize schedules that are time-varying\, asymmetric\, and (most surprisingly) periodically negative. We show that all three properties are necessary for convergence\, and that altogether this enables GDA to converge on the classical counterexamples (e.g.\, unconstrained convex-concave problems). The core intuition is that although negative stepsizes make backward progress\, they de-synchronize the min/max variables (overcoming the cycling issue of GDA) and lead to a slingshot phenomenon in which the forward progress in the other iterations is overwhelmingly larger. This results in fast overall convergence. Geometrically\, the slingshot dynamics leverage the non-reversibility of gradient flow: positive/negative steps cancel to first order\, yielding a second-order net movement in a new direction that leads to convergence and is otherwise impossible for GDA to move in. Joint work with Henry Shugart.\n\n\n10:00–10:30 am\nBreak\n\n\n10:30–11:30 am\nNabarun Deb\, U Chicago \nGenerative Modeling via Parabolic Monge-Ampère PDEs \nWe introduce a novel generative modeling framework based on a discretized parabolic Monge-Ampère PDE\, which emerges as a continuous limit of the Sinkhorn algorithm commonly used in optimal transport. Our method performs iterative refinement in the space of Brenier maps using a mirror gradient descent step. We establish theoretical guarantees for generative modeling through the lens of no-regret analysis\, demonstrating that the iterates converge to the optimal Brenier map under a variety of step-size schedules. As a technical contribution\, we derive a new Evolution Variational Inequality tailored to the parabolic Monge-Ampère PDE\, connecting geometry\, transportation cost\, and regret. Our framework accommodates non-log-concave target distributions\, constructs an optimal sampling process via the Brenier map\, and integrates favorable learning techniques from generative adversarial networks and score-based diffusion models.\n\n\n11:30–12:00\nBreak\n\n\n12:00–1:00\nSinho Chewi\, Yale \nDiscretization and distribution learning in diffusion models \nFirst\, I will review some literature on discretization of diffusion models\, focusing on the use of randomized midpoints for deterministic vs. stochastic samplers. Then\, I will argue that such sampling guarantees reduce distribution learning\, in the form of learning to generate a sample\, to score matching. To complement this result\, we reduce other forms of distribution learning (parameter estimation and density estimation) to score matching as well. This leads to new consequences for diffusion models\, such as asymptotic efficiency of a DDPM-based parameter estimator and algorithms for Gaussian mixture density estimation\, as well as to a general approach for establishing cryptographic hardness results for score estimation.\n\n\n\n  \nThursday\, Oct. 9\, 2025 \n\n\n\n8:30–9:00 am\nMorning refreshments\n\n\n9:00–10:00 am\nAhmed El Alaoui\, Cornell \nHow abundant are good interpolators? \nWe consider classifying labelled data in the interpolation regime where there exist linear classifiers (with possibly negative margin) correctly classifying all points in the dataset. Under the logistic model with gaussian features\, we derive the large deviation rate function of the event that an interpolator chosen uniformly at random achieves a given generalization error. This describes the proportion of interpolators having any desired performance. We remark that in a wide regime of parameters\, the vast majority of interpolators have inferior performance than the one found via a simple linear programming procedure\, showing that the latter algorithm produces an atypically good classifier.\nThis is based on joint work with August Chen.\n\n\n10:00–10:30 am\nbreak\n\n\n10:30–11:30 am\nTengyu Ma\, Stanford \nSelf-play LLM Theorem Provers with Iterative Conjecturing and Proving \nI will discuss some works on using RL for theorem proving\, especially in the possible future regime where we ran out of high-quality training data. To keep improving the models with limited data\, we draw inspiration from mathematicians\, who continuously develop new results\, partly by proposing novel conjectures or exercises (which are often variants of known results) and attempting to solve them. We design the Self-play Theorem Prover (STP) that simultaneously takes on two roles\, conjecturer and prover\, each providing training signals to the other. The model achieves state-of-the-art performance among whole-proof generation methods on miniF2F-test (65.0%\, pass@3200)\, Proofnet-test (23.9%\, pass@3200) and PutnamBench (8/644\, pass@3200). \n \n\n\n11:30–12:00\nbreak\n\n\n12:00–1:00\nEdgar Dobriban\, U Penn \nLeveraging synthetic data in statistical inference \nThe rapid proliferation of high-quality synthetic data — generated by advanced AI models or collected as auxiliary data from related tasks — presents both opportunities and challenges for statistical inference. This paper introduces a GEneral Synthetic-Powered Inference (GESPI) framework that wraps around any statistical inference procedure to safely enhance sample efficiency by combining synthetic and real data. Our framework leverages high-quality synthetic data to boost statistical power\, yet adaptively defaults to the standard inference method using only real data when synthetic data is of low quality. The error of our method remains below a user-specified bound without any distributional assumptions on the synthetic data\, and decreases as the quality of the synthetic data improves. This flexibility enables seamless integration with conformal prediction\, risk control\, hypothesis testing\, and multiple testing procedures\, all without modifying the base inference method. We demonstrate the benefits of our method on challenging tasks with limited labeled data\, including AlphaFold protein structure prediction\, and comparing large reasoning models on complex math problems.\n\n\n\n  \nFriday\, Oct. 10\, 2025 \n\n\n\n8:30–9:00 am\nMorning refreshments\n\n\n9:00–10:00 am\nTijana Zrnic\, Stanford \nProbably Approximately Correct Labels \nObtaining high-quality labeled datasets is often costly\, requiring either extensive human annotation or expensive experiments. We propose a method that supplements such “expert” labels with AI predictions from pre-trained models to construct labeled datasets more cost-effectively. Our approach results in probably approximately correct labels: with high probability\, the overall labeling error is small. This solution enables rigorous yet efficient dataset curation using modern AI models. We demonstrate the benefits of the methodology through text annotation with large language models\, image labeling with pre-trained vision models\, and protein folding analysis with AlphaFold. This is joint work with Emmanuel Candes and Andrew Ilyas.\n\n\n10:00–10:30 am\nBreak\n\n\n10:30–11:30 am\nAlexander Rakhlin\, MIT \nElements of Interactive Decision Making \nMachine learning methods are increasingly deployed in interactive environments\, ranging from dynamic treatment strategies in medicine to fine-tuning of LLMs using reinforcement learning. In these settings\, the learning agent interacts with the environment to collect data and necessarily faces an exploration-exploitation dilemma. We present a general framework for interactive decision making that subsumes multi-armed bandits\, contextual bandits\, structured bandits\, and reinforcement learning. We focus on both the statistical aspect of learning—aiming to develop a tight characterization of sample complexity in terms of properties of the class of models—and on the basic algorithmic primitives.\n\n\n\n  \n  \n\n  \n 
URL:https://cmsa.fas.harvard.edu/event/mathai/
LOCATION:CMSA 20 Garden Street Cambridge\, Massachusetts 02138 United States
CATEGORIES:Workshop
ATTACH;FMTTYPE=image/jpeg:https://cmsa.fas.harvard.edu/media/MathAI.5.jpg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20251007T161500
DTEND;TZID=America/New_York:20251007T183000
DTSTAMP:20260527T113601
CREATED:20251001T183038Z
LAST-MODIFIED:20251007T132737Z
UID:10003802-1759853700-1759861800@cmsa.fas.harvard.edu
SUMMARY:A Classifying Space for Phases of Matrix Product States
DESCRIPTION:Geometry and Quantum Theory Seminar \nSpeakers: Daniel Spiegel\, Harvard Math \nTitle: A Classifying Space for Phases of Matrix Product States \nAbstract: Alexei Kitaev has conjectured that there should be a loop spectrum consisting of spaces of gapped invertible quantum spin systems\, indexed by spatial dimension d of the lattice. Motivated by Kitaev’s conjecture\, I will detail a concrete construction of a topological space B consisting of translation invariant injective matrix product states (MPS) of all physical and bond dimensions\, which plays the role Kitaev’s space in dimension d = 1. Having such a space is a useful tool in the discussion of parametrized phases of MPS; in fact it allows us to define a parametrized phase as a homotopy class of maps into B. The space B is constructed as the quotient of a contractible space E of MPS tensors modulo gauge transformations. The projection map from E to B is a quasifibration\, from which we can compute the homotopy groups of the classifying space B by a long exact sequence. In particular\, B has the weak homotopy type K(Z\, 2) x K(Z\, 3)\, shedding light on Kitaev’s conjecture in the context of MPS. \nDaniel Spiegel will speak for 60 minutes. \nSunghyuk Park  (CMSA) will also speak for 15 minutes
URL:https://cmsa.fas.harvard.edu/event/quantumgeo_10725/
LOCATION:Science Center 507\, 1 Oxford Street\, Cambridge\, 02138
CATEGORIES:Geometry and Quantum Theory Seminar
ATTACH;FMTTYPE=image/png:https://cmsa.fas.harvard.edu/media/CMSA-Geometry-Quantum-Theory-10.7.25-scaled.png
END:VEVENT
END:VCALENDAR