BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//CMSA - ECPv6.15.18//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-ORIGINAL-URL:https://cmsa.fas.harvard.edu
X-WR-CALDESC:Events for CMSA
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20240310T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20241103T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20250309T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20251102T060000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20260308T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20261101T060000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20250911T090000
DTEND;TZID=America/New_York:20250912T170000
DTSTAMP:20260503T092123
CREATED:20250502T175902Z
LAST-MODIFIED:20251026T044243Z
UID:10003743-1757581200-1757696400@cmsa.fas.harvard.edu
SUMMARY:Big Data Conference 2025
DESCRIPTION:Big Data Conference 2025 \nDates: Sep. 11–12\, 2025 \nLocation: Harvard University CMSA\, 20 Garden Street\, Cambridge & via Zoom \nThe Big Data Conference features speakers from the Harvard community as well as scholars from across the globe\, with talks focusing on computer science\, statistics\, math and physics\, and economics. \nInvited Speakers \n\nMarkus J. Buehler\, MIT\nYiling Chen\, Harvard\nJordan Ellenberg\, UW Madison\nYue M. Lu\, Harvard\nPankaj Mehta\, BU\nNick Patterson\, Harvard\nGautam Reddy\, Princeton\nTrevor David Rhone\, Rensselaer Polytechnic Institute\nTess Smidt\, MIT\n\nOrganizers: \nMichael M. Desai\, Harvard OEB |  Michael R. Douglas\, Harvard CMSA | Yannai A. Gonczarowski\, Harvard Economics | Efthimios Kaxiras\, Harvard Physics | Melanie Weber\, Harvard SEAS \n  \nBig Data Youtube Playlist \n  \nSchedule \nThursday\, Sep. 11\, 2025 \n  \n\n\n\n9:00 am\nRefreshments\n\n\n9:30 am\nIntroductions\n\n\n9:45–10:45 am\nGautam Reddy\, Princeton \nTitle: Global epistasis in genotype-phenotype maps\n\n\n10:45–11:00 am\nBreak\n\n\n11:00 am –12:00 pm\nNick Patterson\, Harvard \nTitle: The Origin of the Indo-Europeans \nAbstract: Indo-European is the largest family of human languages\, with very wide geographical distribution and more than 3 billion native speakers. How did this family arise and spread? This question has been discussed for nearly 250 years but with the advent of the availability of DNA from ancient fossils is now largely understood\, at least in broad outlines. We will describe what we now know about the origins.\n\n\n12:00–1:30 pm\nLunch break\n\n\n1:30–2:30 pm\nMarkus Buehler\, MIT \nTitle: Superintelligence for scientific discovery \nAbstract: AI is moving beyond prediction to become a partner in invention. While today’s models excel at interpolating within known data\, true discovery requires stepping outside existing truths. This talk introduces superintelligent discovery engines built on multi-agent swarms: diverse AI agents that interact\, compete\, and cooperate to generate structured novelty. Guided by Gödel’s insight that no closed system is complete\, these swarms create gradients of difference – much like temperature gradients in thermodynamics – that sustain flow\, invention\, and surprise. Case studies in protein design and music composition show how swarms escape data biases\, invent novel structures\, and weave long-range coherence\, producing creativity that rivals human processes. By moving from “big data” to “big insight”\, these systems point toward a new era of AI that composes knowledge across science\, engineering\, and the arts.\n\n\n2:30–2:45 pm\nBreak\n\n\n2:45–3:45 pm\nJordan Ellenberg\, UW Madison \nTitle: What does machine learning have to offer mathematics?\n\n\n3:45–4:00 pm\nBreak\n\n\n4:00–5:00 pm\nPankaj Mehta\, Boston University \nTitle: Thinking about high-dimensional biological data in the age of AI \nAbstract: The molecular biology revolution has transformed our view of living systems. Scientific explanations of biological phenomena are now synonymous with the identification of the genes and proteins. The preeminence of the molecular paradigm has only become more pronounced as new technologies allow us to make measurements at scale. Combining this wealth of data with new artificial intelligence (AI) techniques is widely viewed as the future of biology. Here\, I will discuss the promise and perils of this approach. I will focus on our unpublished work with collaborators on two fronts: (i) transformer-based models for understanding genotype-to-phenotype maps\, and (ii) LLM-based ‘foundational models’ for cellular identity\, such as TranscriptFormer\, which is trained on single-cell RNA sequencing (scRNAseq) data. While LLMs excel at capturing complex evolutionary and demographic structure in DNA sequence data\, they are much less adept at elucidating the biology of cellular identity. We show that simple parameter-free models based on linear-algebra outperform TranscriptFormer on downstream tasks related to cellular identity\, even though TranscriptFormer has nearly a billion parameters. If time permits\, I will conclude by showing how we can combine ideas from linear algebra\, bifurcation theory\, and statistical physics to classify cell fate transitions using scRNAseq data.\n\n\n\n  \nFriday\, Sep. 12\, 2025  \n\n\n\n9:00-9:45 am\nRefreshments\n\n\n9:45–10:45 am\nYiling Chen\, Harvard \nTitle: Data Reliability Scoring \nAbstract: Imagine you are trying to make a data-driven decision\, but the data at hand may be noisy\, biased\, or even strategically manipulated. Can you assess whether such a dataset is reliable—without access to ground truth?\nWe initiate the study of reliability scoring for datasets reported by potentially strategic data sources. While the true data remain unobservable\, we assume access to auxiliary observations generated by an unknown statistical process that depends on the truth. We introduce the Gram Determinant Score\, a reliability measure that evaluates how well the reported data align with the unobserved truth\, using only the reported data and the auxiliary observations. The score comes with provable guarantees: it preserves several natural reliability orderings. Experimentally\, it effectively captures data quality in settings with synthetic noise and contrastive learning embeddings.\nThis talk is based on joint work with Shi Feng\, Fang-Yi Yu\, and Paul Kattuman.\n\n\n10:45–11:00 am\nBreak\n\n\n11:00 am –12:00 pm\nYue M. Lu\, Harvard \nTitle: Nonlinear Random Matrices in High-Dimensional Estimation and Learning \nAbstract: In recent years\, new classes of structured random matrices have emerged in statistical estimation and machine learning. Understanding their spectral properties has become increasingly important\, as these matrices are closely linked to key quantities such as the training and generalization performance of large neural networks and the fundamental limits of high-dimensional signal recovery. Unlike classical random matrix ensembles\, these new matrices often involve nonlinear transformations\, introducing additional structural dependencies that pose challenges for traditional analysis techniques. \nIn this talk\, I will present a set of equivalence principles that establish asymptotic connections between various nonlinear random matrix ensembles and simpler linear models that are more tractable for analysis. I will then demonstrate how these principles can be applied to characterize the performance of kernel methods and random feature models across different scaling regimes and to provide insights into the in-context learning capabilities of attention-based Transformer networks.\n\n\n12:00–1:30 pm\nLunch break\n\n\n1:30–2:30 pm\nTrevor David Rhone\, Rensselaer Polytechnic Institute \nTitle: Accelerating the discovery of van der Waals quantum materials using AI \nAbstract: van der Waals (vdW) materials are exciting platforms for studying emergent quantum phenomena\, ranging from long-range magnetic order to topological order. A conservative estimate for the number of candidate vdW materials exceeds ~106 for monolayers and ~1012 for heterostructures. How can we accelerate the exploration of this entire space of materials? Can we design quantum materials with desirable properties\, thereby advancing innovation in science and technology? A recent study showed that artificial intelligence (AI) can be harnessed to discover new vdW Heisenberg ferromagnets based on Cr2Ge2Te6 [1]\, [2] and magnetic vdW topological insulators based on MnBi2Te4 [3]. In this talk\, we will harness AI to efficiently explore the large chemical space of vdW materials and to guide the discovery of vdW materials with desirable spin and charge properties. We will focus on crystal structures based on monolayer Cr2I6 of the form A2X6\, which are studied using density functional theory (DFT) calculations and AI. Magnetic properties\, such as the magnetic moment are determined. The formation energy is also calculated and used as a proxy for the chemical stability. We also investigate monolayers based on MnBi2Te4 of the form AB2X4 to identify novel topological materials. Further to this\, we study heterostructures based on MnBi2Te4/Sb2Te3 stacks. We show that AI\, combined with DFT\, can provide a computationally efficient means to predict the thermodynamic and magnetic properties of vdW materials [4]\,[5]. This study paves the way for the rapid discovery of chemically stable vdW quantum materials with applications in spintronics\, magnetic memory and novel quantum computing architectures.\n[1]        T. D. Rhone et al.\, “Data-driven studies of magnetic two-dimensional materials\,” Sci. Rep.\, vol. 10\, no. 1\, p. 15795\, 2020.\n[2]        Y. Xie\, G. Tritsaris\, O. Granas\, and T. Rhone\, “Data-Driven Studies of the Magnetic Anisotropy of Two-Dimensional Magnetic Materials\,” J. Phys. Chem. Lett.\, vol. 12\, no. 50\, pp. 12048–12054.\n[3]        R. Bhattarai\, P. Minch\, and T. D. Rhone\, “Investigating magnetic van der Waals materials using data-driven approaches\,” J. Mater. Chem. C\, vol. 11\, p. 5601\, 2023.\n[4]        T. D. Rhone et al.\, “Artificial Intelligence Guided Studies of van der Waals Magnets\,” Adv. Theory Simulations\, vol. 6\, no. 6\, p. 2300019\, 2023.\n[5]        P. Minch\, R. Bhattarai\, K. Choudhary\, and T. D. Rhone\, “Predicting magnetic properties of van der Waals magnets using graph neural networks\,” Phys. Rev. Mater.\, vol. 8\, no. 11\, p. 114002\, Nov. 2024.\nThis work used the Extreme Science and Engineering Discovery Environment (XSEDE)\, which is supported by National Science Foundation Grant No. ACI-1548562. This research used resources of the Argonne Leadership Computing Facility\, which is a DOE Office of Science User Facility supported under Contract No. DE-AC02-06CH11357. This material is based on work supported by the National Science Foundation CAREER award under Grant No. 2044842.\n\n\n2:30–2:45 pm\nBreak\n\n\n2:45–3:45 pm\nTess Smidt\, MIT \nTitle: Applications of Euclidean neural networks to understand and design atomistic systems \nAbstract: Atomic systems (molecules\, crystals\, proteins\, etc.) are naturally represented by a set of coordinates in 3D space labeled by atom type. This poses a challenge for machine learning due to the sensitivity of coordinates to 3D rotations\, translations\, and inversions (the symmetries of 3D Euclidean space). Euclidean symmetry-equivariant Neural Networks (E(3)NNs) are specifically designed to address this issue. They faithfully capture the symmetries of physical systems\, handle 3D geometry\, and operate on the scalar\, vector\, and tensor fields that characterize these systems. \nE(3)NNs have achieved state-of-the-art results across atomistic benchmarks\, including small-molecule property prediction\, protein-ligand binding\, force prediciton for crystals\, molecules\, and heterogeneous catalysis. By merging neural network design with group representation theory\, they provide a principled way to embed physical symmetries directly into learning. In this talk\, I will survey recent applications of E(3)NNs to materials design and highlight ongoing debates in the AI for atomistic sciences community: how to balance the incorporation of physical knowledge with the drive for engineering efficiency.\n\n\n\n 
URL:https://cmsa.fas.harvard.edu/event/bigdata_2025/
LOCATION:CMSA Room G10\, CMSA\, 20 Garden Street\, Cambridge\, MA\, 02138\, United States
CATEGORIES:Big Data Conference,Conference,Event
ATTACH;FMTTYPE=image/jpeg:https://cmsa.fas.harvard.edu/media/Big-Data-2025_11x17.9-scaled.jpg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20250915T090000
DTEND;TZID=America/New_York:20250918T170000
DTSTAMP:20260503T092123
CREATED:20250710T134311Z
LAST-MODIFIED:20250930T154307Z
UID:10003755-1757926800-1758214800@cmsa.fas.harvard.edu
SUMMARY:The Geometry of Machine Learning
DESCRIPTION:The Geometry of Machine Learning \nDates: September 15–18\, 2025 \nLocation: Harvard CMSA\, Room G10\, 20 Garden Street\, Cambridge MA 02138 \nDespite the extraordinary progress in large language models\, mathematicians suspect that other dimensions of intelligence must be defined and simulated to complete the picture. Geometric and symbolic reasoning are among these. In fact\, there seems to be much to learn about existing ML by considering it from a geometric perspective\, e.g. what is happening to the data manifold as it moves through a NN?  How can geometric and symbolic tools be interfaced with LLMs? A more distant goal\, one that seems only approachable through AIs\, would be to gain some insight into the large-scale structure of mathematics as a whole: the geometry of math\, rather than geometry as a subject within math. This conference is intended to begin a discussion on these topics. \nSpeakers \n\nMaissam Barkeshli\, University of Maryland\nEve Bodnia\, Logical Intelligence\nAdam Brown\, Stanford\nBennett Chow\, USCD & IAS\nMichael Freedman\, Harvard CMSA\nElliot Glazer\, Epoch AI\nJames Halverson\, Northeastern\nJesse Han\, Math Inc.\nJunehyuk Jung\, Brown University\nAlex Kontorovich\, Rutgers University\nYann Lecun\, New York University & META*\nJared Duker Lichtman\, Stanford  & Math Inc.\nBrice Ménard\, Johns Hopkins\nMichael Mulligan\, UCR & Logical Intelligence\nPatrick Shafto\, DARPA & Rutgers University\n\nOrganizers: Michael R. Douglas (CMSA) and Mike Freedman (CMSA) \n  \nGeometry of Machine Learning Youtube Playlist \n  \nSchedule \nMonday\, Sep. 15\, 2025 \n\n\n\n8:30–9:00 am\nMorning refreshments\n\n\n9:00–10:00 am\nJames Halverson\, Northeastern \nTitle: Sparsity and Symbols with Kolmogorov-Arnold Networks \nAbstract: In this talk I’ll review Kolmogorov-Arnold nets\, as well as new theory and applications related to sparsity and symbolic regression\, respectively.  I’ll review essential results regarding KANs\, show how sparsity masks relate deep nets and KANs\, and how KANs can be utilized alongside multimodal language models for symbolic regression. Empirical results will necessitate a few slides\, but the bulk will be chalk.\n\n\n10:00–10:30 am\nBreak\n\n\n10:30–11:30 am\nMaissam Barkeshli\, University of Maryland \nTitle: Transformers and random walks: from language to random graphs \nAbstract: The stunning capabilities of large language models give rise to many questions about how they work and how much more capable they can possibly get. One way to gain additional insight is via synthetic models of data with tunable complexity\, which can capture the basic relevant structures of real data. In recent work we have focused on sequences obtained from random walks on graphs\, hypergraphs\, and hierarchical graphical structures. I will present some recent empirical results for work in progress regarding how transformers learn sequences arising from random walks on graphs. The focus will be on neural scaling laws\, unexpected temperature-dependent effects\, and sample complexity.\n\n\n11:30 am–12:00 pm\nBreak\n\n\n12:00–1:00 pm\nAdam Brown\, Stanford \nTitle: LLMs\, Reasoning\, and the Future of Mathematical Sciences \nAbstract: Over the last half decade\, the mathematical capabilities of large language models (LLMs) have leapt from preschooler to undergraduate and now beyond. This talk reviews recent progress\, and speculates as to what it will mean for the future of mathematical sciences if these trends continue.\n\n\n\n  \nTuesday\, Sep. 16\, 2025 \n\n\n\n8:30–9:00 am\nMorning refreshments\n\n\n9:00–10:00 am\nJunehyuk Jung\, Brown University \nTitle: AlphaGeometry: a step toward automated math reasoning \nAbstract: Last summer\, Google DeepMind’s AI systems made headlines by achieving Silver Medal level performance on the notoriously challenging International Mathematical Olympiad (IMO) problems. For instance\, AlphaGeometry 2\, one of these remarkable systems\, solved the geometry problem in a mere 19 seconds! \nIn this talk\, we will delve into the inner workings of AlphaGeometry\, exploring the innovative techniques that enable it to tackle intricate geometric puzzles. We will uncover how this AI system combines the power of neural networks with symbolic reasoning to discover elegant solutions.\n\n\n10:00–10:30 am\nBreak\n\n\n10:30–11:30 am\nBennett Chow\, USCD and IAS \nTitle: Ricci flow as a test for AI\n\n\n11:30 am–12:00 pm\nBreak\n\n\n12:00–1:00 pm\nJared Duker Lichtman\, Stanford & Math Inc. and Jesse Han\, Math Inc. \nTitle: Gauss – towards autoformalization for the working mathematician \nAbstract: In this talk we’ll highlight some recent formalization progress using a new agent – Gauss. We’ll outline a recent Lean proof of the Prime Number Theorem in strong form\, completing a challenge set in January 2024 by Alex Kontorovich and Terry Tao. We hope Gauss will help assist working mathematicians\, especially those who do not write formal code themselves.\n\n\n5:00–6:00 pm\nSpecial Lecture: Yann LeCun\, Science Center Hall C\n\n\n\n  \nWednesday\, Sep. 17\, 2025 \n\n\n\n8:30–9:00 am\nRefreshments\n\n\n9:00–10:00 am\nMichael Mulligan\, UCR and Logical Intelligence \nTitle: Spontaneous Kolmogorov-Arnold Geometry in Vanilla Fully-Connected Neural Networks \nAbstract: The Kolmogorov-Arnold (KA) representation theorem constructs universal\, but highly non-smooth inner functions (the first layer map) in a single (non-linear) hidden layer neural network. Such universal functions have a distinctive local geometry\, a “texture\,” which can be characterized by the inner function’s Jacobian\, $J(\mathbf{x})$\, as $\mathbf{x}$ varies over the data. It is natural to ask if this distinctive KA geometry emerges through conventional neural network optimization. We find that indeed KA geometry often does emerge through the process of training vanilla single hidden layer fully-connected neural networks (MLPs). We quantify KA geometry through the statistical properties of the exterior powers of $J(\mathbf{x})$: number of zero rows and various observables for the minor statistics of $J(\mathbf{x})$\, which measure the scale and axis alignment of $J(\mathbf{x})$. This leads to a rough phase diagram in the space of function complexity and model hyperparameters where KA geometry occurs. The motivation is first to understand how neural networks organically learn to prepare input data for later downstream processing and\, second\, to learn enough about the emergence of KA geometry to accelerate learning through a timely intervention in network hyperparameters. This research is the “flip side” of KA-Networks (KANs). We do not engineer KA into the neural network\, but rather watch KA emerge in shallow MLPs.\n\n\n10:00–10:30 am\nBreak\n\n\n10:30–11:30 am\nEve Bodnia\, Logical Intelligence \nTitle: \nAbstract: We introduce a method of topological analysis on spiking correlation networks in neurological systems. This method explores the neural manifold as in the manifold hypothesis\, which posits that information is often represented by a lower-dimensional manifold embedded in a higher-dimensional space. After collecting neuron activity from human and mouse organoids using a micro-electrode array\, we extract connectivity using pairwise spike-timing time correlations\, which are optimized for time delays introduced by synaptic delays. We then look at network topology to identify emergent structures and compare the results to two randomized models – constrained randomization and bootstrapping across datasets. In histograms of the persistence of topological features\, we see that the features from the original dataset consistently exceed the variability of the null distributions\, suggesting that the observed topological features reflect significant correlation patterns in the data rather than random fluctuations. In a study of network resiliency\, we found that random removal of 10 % of nodes still yielded a network with a lesser but still significant number of topological features in the homology group H1 (counts 2-dimensional voids in the dataset) above the variability of our constrained randomization model; however\, targeted removal of nodes in H1 features resulted in rapid topological collapse\, indicating that the H1 cycles in these brain organoid networks are fragile and highly sensitive to perturbations. By applying topological analysis to neural data\, we offer a new complementary framework to standard methods for understanding information processing across a variety of complex neural systems.\n\n\n11:30 am–12:00 pm\nBreak\n\n\n12:00–1:00 pm\nAlex Kontorovich\, Rutgers University \nTitle: The Shape of Math to Come \nAbstract: We will discuss some ongoing experiments that may have meaningful impact on what working in research mathematics might look like in a decade (if not sooner).\n\n\n5:00–6:00 pm\nMike Freedman Millennium Lecture: The Poincaré Conjecture and Mathematical Discovery (Science Center Hall D)\n\n\n\n  \nThursday\, Sep. 18\, 2025 \n\n\n\n8:30–9:00 am\nMorning refreshments\n\n\n9:00–10:00 am\nElliott Glazer\, Epoch AI \nTitle: FrontierMath to Infinity \nAbstract: I will discuss FrontierMath\, a mathematical problem solving benchmark I developed over the past year\, including its design philosophy and what we’ve learned about AI’s trajectory from it. I will then look much further out\, speculate about what a “perfectly efficient” mathematical intelligence should be capable of\, and discuss how high-ceiling math capability metrics can illuminate the path towards that ideal.\n\n\n10:00–10:30 am\nBreak\n\n\n10:30–11:30 am\nBrice Ménard\, Johns Hopkins \nTitle:Demystifying the over-parametrization of neural networks \nAbstract: I will show how to estimate the dimensionality of neural encodings (learned weight structures) to assess how many parameters are effectively used by a neural network. I will then show how their scaling properties provide us with fundamental exponents on the learning process of a given task. I will comment on connections to thermodynamics.\n\n\n11:30 am–12:00 pm\nBreak\n\n\n12:00–12:30 pm\nPatrick Shafto\, Rutgers \nTitle: Math for AI and AI for Math \nAbstract: I will briefly discuss two DARPA programs aiming to deepen connections between mathematics and AI\, specifically through geometric and symbolic perspectives. The first aims for mathematical foundations for understanding the behavior and performance of modern AI systems such as Large Language Models and Diffusion models. The second aims to develop AI for pure mathematics through an understanding of abstraction\, decomposition\, and formalization. I will close with some thoughts on the coming convergence between AI and math.\n\n\n12:30–12:45 pm\nBreak\n\n\n12:45–2:00 pm\nMike Freedman\, Harvard CMSA \nTitle: How to think about the shape of mathematics \nFollowed by group discussion \n \n\n\n\n  \n  \n  \nSupport provided by Logical Intelligence. \n \n  \n 
URL:https://cmsa.fas.harvard.edu/event/mlgeometry/
LOCATION:CMSA 20 Garden Street Cambridge\, Massachusetts 02138 United States
CATEGORIES:Conference,Event
ATTACH;FMTTYPE=image/jpeg:https://cmsa.fas.harvard.edu/media/GML_2025.7-scaled.jpg
END:VEVENT
END:VCALENDAR