Towards Understanding Training Dynamics for Mildly Overparametrized Models
Abstract: While over-parameterization is widely believed to be crucial for the success of optimization for the neural networks, most existing theories on over-parameterization do not fully explain the reason — they either work in the Neural Tangent Kernel regime where neurons don’t move much, or require an enormous number of neurons. In this talk I will […]