Abstract: While over-parameterization is widely believed to be crucial for the success of optimization for the neural networks, most existing theories on over-parameterization do not fully explain the reason — they either work in the Neural Tangent Kernel regime where neurons don’t move much, or require an enormous number of neurons. In this talk I will describe our recent works towards understanding training dynamics that go beyond kernel regimes with only polynomially many neurons (mildly overparametrized). In particular, we first give a local convergence result for mildly overparametrized two-layer networks. We then analyze the global training dynamics for a related overparametrized tensor model. For both works, we rely on a key intuition that neurons in overparametrized models work in groups and it’s important to understand the behavior of an average neuron in the group. Based on two works: https://arxiv.org/abs/2102.02410 and https://arxiv.org/abs/2106.06573.
Bio: Professor Rong Ge is Associate Professor of Computer Science at Duke University. He received his Ph.D. from the Computer Science Department of Princeton University, supervised by Sanjeev Arora. He was a post-doc at Microsoft Research, New England. In 2019, he received both a Faculty Early Career Development Award from the National Science Foundation and the prestigious Sloan Research Fellowship. His research interest focus on theoretical computer science and machine learning. Modern machine learning algorithms such as deep learning try to automatically learn useful hidden representations of the data. He is interested in formalizing hidden structures in the data and designing efficient algorithms to find them. His research aims to answer these questions by studying problems that arise in analyzing text, images, and other forms of data, using techniques such as non-convex optimization and tensor decompositions.