- This event has passed.
Hierarchical data structures through the lenses of diffusion models
October 2, 2024 @ 2:00 pm - 3:00 pm
New Technologies in Mathematics Seminar
Speaker: Antonio Sclocchi, EPFL
Title: Hierarchical data structures through the lenses of diffusion models
Abstract: The success of deep learning with high-dimensional data relies on the fact that natural data are highly structured. A key aspect of this structure is hierarchical compositionality, yet quantifying it remains a challenge.
In this talk, we explore how diffusion models can serve as a tool to probe the hierarchical structure of data. We consider a context-free generative model of hierarchical data and show the distinct behaviors of high- and low-level features during a noising-denoising process. Specifically, we find that high-level features undergo a sharp transition in reconstruction probability at a specific noise level, while low-level features recombine into new data from different classes. This behavior of latent features leads to correlated changes in real-space variables, resulting in a diverging correlation length at the transition.
We validate these predictions in experiments with real data, using state-of-the-art diffusion models for both images and texts. Remarkably, both modalities exhibit a growing correlation length in changing features at the transition of the noising-denoising process.
Overall, these results highlight the potential of hierarchical models in capturing non-trivial data structures and offer new theoretical insights for understanding generative AI.