Abstract: Deep learning is an exciting approach to modern artificial intelligence based on artificial neural networks. The goal of this talk is to provide a blueprint — using tools from physics — for theoretically analyzing deep neural networks of practical relevance. This task will encompass both understanding the statistics of initialized deep networks and determining the training dynamics of such an ensemble when learning from data.
In terms of their “microscopic” definition, deep neural networks are a flexible set of functions built out of many basic computational blocks called neurons, with many neurons in parallel organized into sequential layers. Borrowing from the effective theory framework, we will develop a perturbative 1/n expansion around the limit of an infinite number of neurons per layer and systematically integrate out the parameters of the network. We will explain how the network simplifies at large width and how the propagation of signals from layer to layer can be understood in terms of a Wilsonian renormalization group flow. This will make manifest that deep networks have a tuning problem, analogous to criticality, that needs to be solved in order to make them useful. Ultimately we will find a “macroscopic” description for wide and deep networks in terms of weakly-interacting statistical models, with the strength of the interactions between the neurons growing with depth-to-width aspect ratio of the network. Time permitting, we will explain how the interactions induce representation learning.
This talk is based on a book, The Principles of Deep Learning Theory, co-authored with Sho Yaida and based on research also in collaboration with Boris Hanin. It will be published next year by Cambridge University Press.