Loading Events

« All Events

  • This event has passed.

Conference on Geometry and Statistics

November 17, 2025 @ 9:00 am - November 19, 2025 @ 5:00 pm

Conference on Geometry and Statistics

Dates: November 17–19, 2025

Location: CMSA G10, 20 Garden Street, Cambridge MA & via Zoom

 

Speakers

  • Charles Fefferman, Princeton University
  • Stephan Huckemann, Georg-August Universität Göttingen
  • Sungkyu Jung, Seoul National University
  • Kei Kobayashi, Keio University
  • Clément Levrard, Université de Rennes
  • Ker-Chau Li, University of California, Los Angeles
  • Rong Ma, Harvard University
  • Steve Marron, University of North Carolina
  • Ezra Miller, Duke University
  • Hans-Georg Müller, University of California, Davis
  • Wilderich Tuschmann, Karlsruhe Institute of Technology
  • Melanie Weber, Harvard University
  • Andrew Wood, Australian National University
  • Horng-Tzer Yau, Harvard University

Organizer: Zhigang Yao, National University of Singapore

 

SCHEDULE

download pdf

Monday, Nov. 17, 2025

9:00–9:25 am
Morning refreshments

9:25–9:30 am
Introductions

9:30–10:30 am
Speaker: Stephan Huckemann, Georg-August Universität Göttingen
Title: The Probability of the Cut Locus of a Fréchet Mean
Abstract: We show that the cut locus of a Fréchet mean of a random variable on a connected and complete Riemanian manifold has zero probability, a result known previously in special cases (Le and Barden, 2014) and conjectured in general. The proof is based on first order and second order considerations, where the latter are based on a recent result by Générau (2020) on “Laplacians in the barrier sense”. This generalizes to Fréchet p-means for p > 2. The former allow also to rule out stickiness on Riemannian manifolds, and for generalization to 1 <= p < 2, with a conjecture. We close with discussing and conjecturing extensions to noncomplete manifolds and more general metric spaces. This is joint work with Alexander Lytchak.

  • Générau, F. (2020). Laplacian of the distance function on the cut locus on a Riemannian manifold. Nonlinearity 33(8), 3928.
  • Le, H. and D. Barden (2014).  On the measure of the cut locus of a Fréchet mean. Bulletin of the London Mathematical Society 46(4), 698–708.
  • Lytchak, A. and S. F. Huckemann (2025). Zero mass at the cut locus of a Fréchet mean on a Riemannian manifold. arXiv preprint arXiv:2508.00747.

10:30–10:45 am
break

10:45 am–11:45 am
Speaker: Hans-Georg Müller, University of California, Davis
Title: Conformal Inference for Random Objects
Abstract: The underlying probability measure of random objects, i.e., metric-space-valued random variables, can be probed by distance profiles. These are one-dimensional distributions of probability mass falling into balls of increasing radius. In a regression setting with Euclidean covariates X and responses Y that are random objects, one can consider conditional Fréchet means that can be implemented with Fréchet regression and also conditional distance profiles, conditioning on X. Conditional distance profiles can then be leveraged to obtain conditional average transport costs, the expected cost for transporting a fixed conditional distance profile to a randomly selected conditional distance profile. The conditional average transport costs can then be utilized to obtain conditional conformity scores. In conjunction with the split conformal algorithm these scores lead to conditional prediction sets located in the object space with asymptotic conditional validity and attractive finite sample behavior. Based on joint work Hang Zhou (UNC).

11:45 am–1:15 pm
Lunch (Catered)

1:15–2:15 pm
Speaker: Horng-Tzer Yau, Harvard
Title:
Ramanujan property of random regular graphs and delocalization of random band matrices
Abstract:
In this lecture, we review recent works on random matrices. The first result is about the normalized adjacency matrix of a random $d$-regular graph on $N$ vertices with any fixed degree $d\geq 3$ and denote its eigenvalues as $\lambda_1=d/\sqrt{d-1}\geq \lambda_2\geq\lambda_3\cdots\geq \lambda_N$. We establish the edge universality for random $d$-regular graphs, namely, the distributions of $\lambda_2$ and $-\lambda_N$ converge to the Tracy-Widom$_1$ distribution associated with the Gaussian Orthogonal Ensemble. As a consequence, for sufficiently large $N$, approximately $69\%$ of $d$-regular graphs on $N$ vertices.
are Ramanujan, meaning $\max\{\lambda_2,|\lambda_N|\}\leq 2$. This resolves a conjecture by Sarnak and Miller-Novikoff-Sabelli
The second result concerns $ N \times N$ Hermitian $d$-dimensional random band matrices with band width $W$. In the bulk of the spectrum and in the large $ N $ limit, we prove that all $ L^2 $- normalized eigenvectors are delocalized in all dimensions under suitable conditions on $W$ and $N$. In addition, we proved that the eigenvalue statistics are given by those of the Gaussian unitary ensemble.

2:15–2:45 pm
break with refreshments

2:45–3:45 pm
Speaker: Clément Levrard, Université de Rennes
Title: Optimal reach estimation
Abstract:
The reach of an embedded submanifold, a notion that dates back to the famous work Curvature measures of H. Federer, may be understood as a scale under which the submanifold is flat enough so that traditional Euclidean techniques in statistics locally apply, up to some approximation. I will expose several ways to estimate the reach from sample (on the submanifold), some of them being optimal from the point of view of minimax estimation theory. Along the way, intermediate estimation problems of local and global quantities will arise (curvature estimation, weak feature size estimation, distance estimation, etc.), for which various phenomenons can occur from a statistical point of view (different convergence rates, inconsistency). This will be an opportunity to provide a selective overview of the state of the art on these issues.

4:30–5:30 pm
CMSA Colloquium
Speaker:
Zhigang Yao (National University of Singapore)
Title: Interaction of Statistics and Geometry: A New Landscape for Data Science
Abstract:  Classical statistics views data as real numbers or vectors in Euclidean space, but modern challenges increasingly involve data with intrinsic geometric structures. A central problem in this direction is manifold fitting, with origins in H. Whitney’s work of the 1930s. The Geometric Whitney Problems ask: given a set, when can we construct a smooth 𝑑-dimensional manifold that approximates it, and how accurately can we estimate it?
In this talk, I will discuss recent progress on manifold fitting and its role in bridging geometry and data science. While many existing methods rely on restrictive assumptions, the manifold hypothesis—that data often lie near non-Euclidean structures—remains fundamental in modern statistical learning. I will highlight both theoretical insights and algorithmic challenges, drawing on recent works with, as well as ongoing research.

 

Tuesday, Nov. 18, 2025

9:00–9:30 am
Morning refreshments

9:30–10:30 am
Speaker:
Charles Fefferman, Princeton University (via Zoom)
Title: Extrinsic and intrinsic manifold learning, old and new
Abstract: The talk will include an exposition of the old paper “Testing the manifold hypothesis”, joint work with S. Mitter and H. Narayanan, on extrinsic manifold learning (the manifold to be learned is assumed to be embedded in a high-dimensional Euclidean space). The talk will also include a new result on intrinsic manifold learning (the manifold to be learned is not assumed to be embedded, and the data consist of intrinsic distances corrupted by noise), provided the result is proven by the time of the conference.

10:30–10:45 am
break

10:45 am–11:45 am
Speaker:
Steve Marron, University of North Carolina
Title: Data Integration Via Analysis of Manifolds (DIVAM)
Abstract: A major challenge in the age of Big Data is the integration of disparate data types into a single data analysis. That was tackled by Data Integration Via Analysis of Subspaces (DIVAS) in the context of data blocks measured on a common set of experimental cases. Joint variation was defined in terms of modes of variation having identical scores across data blocks. DIVAS allowed mathematically rigorous formulation of individual variation within each data block in terms of individual modes. The goal of DIVAM is to intrinsically extend the DIVAS approach to data objects lying in manifolds, such as shape data.

11:45 am–1:15 pm
Lunch Break

1:15–2:15 pm
Speaker:
Ker-Chau Li, University of California, Los Angeles
Title: Investigation of Data clouds: From Galton’s Ellipses to Explainable AI (XAI), modeling or molding?
Abstract: Francis Galton’s seminal 1886 visualization of regression toward the mean in trait inheritance is arguably the first and most influential example of geometric thinking applied to statistical modeling. The pioneering geometric insight driving Galton’s use of elliptical contours to discover the bivariate normal distribution laid down the foundation for classic multivariate analysis (e.g., PCA, canonical correlation) and profoundly impacts modern methods like diffusion models.
Statistical models, particularly those based on parsimony, are effective for characterizing data distribution and facilitating scientific rule induction. However, the rise of unstructured big data (like images) has challenged these parsimonious approaches, necessitating the use of deep learning models. These models, containing billions of parameters, sacrifice transparency to excel in prediction. Seeking solutions to this “black-box” dilemma is now the heart of Explainable AI (XAI).
Leveraging the simplicity of elementary geometric concepts, this talk will present a new path toward interpretable and parsimonious XAI. Unstructured big data is highly plastic. Our approach moves beyond the standard data modeling perspective—which answers what the data is—and introduces a novel data molding perspective. This shift is key to unlocking the full potential of data’s plasticity, allowing us to effectively answer the crucial question: what the data can be used for.
I will first discuss a connection between manifold learning and my earlier works, helical confounding and liquid association. I will then turn to the data molding perspective and present two novel notions: mold-compliance and artificial-trait configurative-generation (ATCG). These notions guide our recent efforts in formulating novel algorithms for image data investigation, addressing issues like prediction validity and within-class heterogeneity. Data molding entails a dramatically different feature space extraction, which consequently shifts the subsequent investigation on the data clouds from out-of-distribution (OOD) to mold-violation, and from UMAP clustering to ATCG-induced hierarchical clustering.

2:15–2:45 pm
break with refreshments

2:45–3:45 pm
Speaker:
Andrew Wood, Australian National University
Title: Empirical likelihood methods for Fréchet means on open books
Abstract: The open book is a simple example of a stratified space that captures some (but not all) of the properties of stratified spaces. Central limit theory for open books plus relevant background is given by Hotz et al. (2013, Annals of Applied Probability). In this talk I will describe some basic inference procedures for Fréchet means in open books based on empirical likelihood (Owen, book, 2001). Empirical likelihood (EL) is a type of nonparametric likelihood that can be useful for many types of data, including manifold-valued data and data from stratified spaces. An EL approach to basic inference for Fréchet means will be described. In particular, it will be shown how the non-regularity in the geometry of open books can result in non-regular behaviour in Wilks’s theorem (i.e. the large sample likelihood ratio test). The talk will also discuss difficulties in extending the EL inference theory from open books to more general stratified spaces, where the difference in dimension of adjacent strata can be 2 or more. For discussion of more general stratified spaces than open books, see the orthant spaces discussed in Barden and Le (2018, Proc of London Math Society) and the general stratified space setting considered by Mattingly et al. (2023, arxiv).

3:45–4:00 pm
break

4:00–5:00 pm
Speaker:
Wilderich Tuschmann, Karlsruhe Institute of Technology
Title: A Spectator’s Perspective on the Manifold Hypothesis
Abstract: At its core, the Manifold Hypothesis asserts that real-world, high-dimensional data is not uniformly or randomly distributed throughout its high-dimensional “ambient” space, but concentrated on or near a low-dimensional manifold (or a collection of manifolds) embedded within that high-dimensional ambient space.
In my talk, I will discuss reasons and facts that speak for as well as against this hypothesis and also address geometric alternatives.

 

Wednesday, Nov. 19, 2025

9:00–9:30 am
Morning refreshments

9:30–10:30 am
Speaker:
Melanie Weber, Harvard University
Title: Ricci Curvature, Ricci Flow, and the Geometry of Learning
Abstract: Geometric structure in data plays a crucial role in machine learning. In this talk, we study this observation through the lens of Ricci curvature and its associated Ricci flow. We start by reviewing a discrete notion of Ricci curvature introduced by Ollivier and the geometric flow that it induces. We further discuss the relationship between discrete Ricci curvature and its continuous counterpart via discrete-to-continuum consistency results, which imply that discrete Ricci curvature can provably characterize the geometry of a data manifold based on a finite sample. This provides a theoretical foundation for several applications of discrete Ricci curvature in machine learning, two of which we discuss in the remainder of this talk. First, we analyze learned feature representations in deep neural networks and show that they transform during training in ways that closely resemble a discrete Ricci flow. Our analysis reveals that nonlinear activations shape class separability and suggests geometry-informed training principles such as early stopping and depth selection. Second, we turn to deep learning on graphs, where we address representational limitations of state of the art graph neural networks through curvature-based data augmentations. We show that augmenting input graphs with geometric information provably increases the representational power of such models and yields performance gains in practice.

10:30–10:45 am
break

10:45 am–11:45 am
Speaker:
Ezra Miller, Duke University
Title:
Extracting bar lengths from multiparameter persistent homology
Abstract: Persistent homology in one parameter can be summarized using bar codes or persistence diagrams, which are elementary gadgets with many features amenable to vectorization and hence statistical analysis. For example, early work with Bendich, Marron, Pieloch, and Skwerer showed how to extract meaningful statistics from the top 100 bar lengths in persistent homology summaries of brain arteries. The story for persistent homology with multiple parameters, on the other hand, is still developing. Although it has the potential to be much more flexible and informative, multipersistence has structural issues that present fundamental mathematical challenges. There is no consensus on what might be meant by a “bar”, let alone “the top 100 bar lengths”. This talk recalls the basics of single and multiparameter persistent homology and discusses some of the mathematical issues, including obstacles and potential routes forward.

11:45 am–1:15 pm
Lunch Break

1:15–2:15 pm
Speaker:
Kei Kobayashi, Keio University
Title: Metric Transformations of Data Spaces: Curvature Control and Related Developments
Abstract: We present our proposed method of increasing the accuracy of data analysis by means of two transformations of the metric of the data space. The first transformation is based on the curve length defined by the integral of the power of the density function, which can be computed approximately using an empirical graph; the second transformation can be interpreted as the extrinsic distance when the data space is embedded in a metric cone. The advantage of both distance transformations is that the hyperparameters allow the curvature to be monotonically transformed in a specific sense. Some statistical applications of these transformations and theoretical justifications are presented. Detailed analyses of the geodesics obtained by this method for several simple probability distributions will also be presented. The main part of this work is based on joint works with Henry P. Wynn.

2:15–2:45 pm
break with refreshments

2:45–3:45 pm
Speaker:
Sungkyu Jung, Seoul National University
Title: Generalized Frechet means with random minimizing domains and its strong consistency
Abstract: In this talk, I will discuss a novel extension of Frechet means, referred to as generalized  Frechet  means, as a comprehensive framework for describing the characteristics of random elements. The generalized Frechet mean is defined as the minimizer of a cost function, and the framework encompasses various extensions of Frechet means that have appeared in the literature. The most distinctive feature of the proposed framework is that it allows the domain of minimization for the empirical generalized Frechet means to be random and different from that of its population counterpart. This flexibility broadens the applicability of the Frechet mean framework to various statistical scenarios, including sequential dimension reduction for non-Euclidean data. We establish a strong consistency theorem for generalized Frechet means. Applications such as verifying the consistency of principal geodesic analysis on the hypersphere, compositional principal component analysis on the composition space, and k-medoids clustering for data on a metric space will be discussed.

3:45–4:00 pm
break

4:00–5:00 pm
Speaker:
Rong Ma, Harvard University
Title: Modern Nonlinear Embedding Methods Unpacked
Abstract: Learning and representing low-dimensional structures from noisy, high-dimensional data is a cornerstone of modern data science. Stochastic neighbor embedding algorithms, a family of nonlinear dimensionality reduction and data visualization methods, with t-SNE and UMAP as two leading examples, have become very popular in recent years. Yet despite their wide applications, these methods remain subject to points of debate, including limited theoretical understanding, ambiguous interpretations, and sensitivity to tuning parameters. In this talk, I will present our recent efforts to decipher and improve these nonlinear embedding approaches. Our key results include a rigorous theoretical framework that uncovers the intrinsic mechanisms, large-sample limits, and fundamental principles underlying these algorithms; a set of theory-informed practical guidelines for their principled use in trustworthy biological discovery; and a collection of new algorithms that address current limitations and improve performance in areas such as bias reduction and stability. Throughout the talk, I will highlight how these advances not only deepen our theoretical understanding but also open new avenues for scientific discovery.

Details

  • Start: November 17, 2025 @ 9:00 am
  • End: November 19, 2025 @ 5:00 pm
  • Event Category: