The Machine Learning for Multiscale Model Reduction Workshop will take place on March 2729, 2019. This is the second of two workshops organized by Michael Brenner, Shmuel Rubinstein, and Tom Hou. The first, Fluid turbulence and Singularities of the Euler/ Navier Stokes equations, will take place on March 1315, 2019. Both workshops will be held in room G10 of the CMSA, located at 20 Garden Street, Cambridge, MA.
For a list of lodging options convenient to the Center, please visit our recommended lodgings page.
Time 
Speaker 
Title/Abstract 
9:00 – 9:05am 
Opening Remarks by Professor S. T. Yau 

9:05 – 9:50am 
Stanley Osher, UCLA 
Title: Partial Differential Equations, Nonconvex Optimization and Deep Neural Nets
Abstract: Recently, links between partial differential equations (PDEs) and DNNs have been established in several interesting directions. We used ideas from HamiltonJacobi (HJ) equations and control and differential games to improve training time, modify andimprove the training algorithm, We propose a very simple modification of gradient descent and stochastic gradient descent. We show that when applied to a variety of machine learning models including softmax regression, convolutional neural nets, generative Adversarial nets, and deep reinforcement learning, this very simple surrogate can dramatically reduce the variance and improve the accuracy of the generalization. The new algorithm, (which depends on one nonnegative parameter) when applied to non convex minimization, tends to avoid local minima. We also present a simple connection between transport equations and deep residual nets, based on stochastic control. This connection enabled us to improve neural nets’ adversarial robustness and generalization accuracy. Again, the programming changes needed to do these improvements are minimal, in cost, complexity and effort.
joint work with many people, especially Bao Wang, Zuoqiang Shi and Adam Oberman 
9:55 – 10:40am 
George Karnidakis, Brown University 
Title: PhysicsInformed Neural Networks (PINNs) for solving stochastic and fractional PDEs
Abstract: In this talk, we will present a new approach to develop a datadriven, learningbased framework for predicting outcomes of physical systems and for discovering hidden physics from noisy data. We will introduce a deep learning approach based on neural networks (NNs) and generative adversarial networks (GANs). Unlike other approaches that rely on big data, here we “learn” from small data by exploiting the information provided by the physical conservation laws, which are used to obtain informative priors or regularize the neural networks. We will also make connections between GPR and NNs and discuss the new powerful concept of metalearning. We will demonstrate the power of PINNs for several inverse problems in fluid mechanics, including wake flows and shock tube problems, where traditional methods fail due to lack of boundary and initial conditions. 
10:45 – 11:15am 
Coffee Break 

11:15 – 12:00pm 
Stephane Mallat, College de France 
Title: Multiscale Model Reduction in Deep Convolutional Networks
Abstract: Learning without suffering from the curse of dimensionality requires to find highdimensional regularity properties enabling to reduce dimensionality. This talk shows that deep convolutional neural network architectures take advantage of scale separation to learn models of reduced dimensionality. We introduce a mathematical framework where scale separations are perfromed with wavelet transforms. Dimensionality reduction takes advantage of prior knowledge on symmetries and is based on learning of sparse representations. Applications will be shown on image classification, regression of quantum molecular energies, modeling of nonGaussian stochastic processes such as turbulences, and signal generations similar to GAN’s. 
12:00 – 2:00pm 
Lunch Break 

2:00 – 2:45pm 
Jinchao Xu, Penn State University 
Title: Deep Neural Network and Multigrid
Abstract: In this talk, I will discuss relationship between some deep learning models and traditional algorithms such as finite element and multigrid methods. Such relationships can be used to study, explain and improve the model structures, mathematical properties and relevant training algorithms for deep neural networks. I will report a class of new training algorithms that can be used to improve the efficiency of convolutional neural networks (CNN) by significantly reducing the redundancy of the model without losing accuracy. By combining multigrid and deep learning methodologies, I will present a unified model, known as MgNet, that simultaneously recovers some CNNs for image classification and multigrid methods for solving discretized PDEs. MgNet can also be used to derive a new class of CNNs that mathematically unify many existing CNN models and computationally competitive, and it can also be used to design new multigrid methods for solving discretized partial differential equations. 
2:50 – 3:35pm 
Joan Bruna Estrach, the Courant Institute 
Title: Global Convergence of Neuron BirthDeath Dynamics
Abstract: Neural networks with a large number of parameters admit a meanfield description, which has recently served as a theoretical explanation for the favorable training properties of “overparameterized” models. In this regime, gradient descent obeys a deterministic partial differential equation (PDE) that converges to a globally optimal solution for networks with a single hidden layer under appropriate assumptions. In this talk, we propose a nonlocal mass transport dynamics that leads to a modified PDE with the same minimizer. We implement this nonlocal dynamics as a stochastic neuronal birthdeath process and we prove that it accelerates the rate of convergence in the meanfield limit. We subsequently realize this PDE with two classes of numerical schemes that converge to the meanfield equation, each of which can easily be implemented for neural networks with finite numbers of parameters. We illustrate our algorithms with two models to provide intuition for the mechanism through which convergence is accelerated. Joint work with G. Rotskoff (NYU), S. Jelassi (Princeton) and E. VandenEijnden (NYU). 
3:40 – 4:10pm 
Coffee Break 

4:10 – 4:55pm 
Jack Xin, UC Irvine 
Title: Discrete Optimization in Gradient Based Deep Learning Abstract: Discrete functions and operations often appear in deep neural networks for complexity reduction and information propagation. We consider two such scenarios. One is quantization which restricts the activation functions to piecewise constant and weights to a discrete set. The other is automated channel shuffling by learning permutation matrices. Though classical gradients may vanish or may not even exist, we show the success of coarse (large scale) gradient and an exact Lipschitz continuous penalty in guiding the learning process on deep networks via image data and solvable analytical models. 
Time 
Speaker 
Title/Abstract 
9:00 – 9:45am 
Predrag Cvitanovic, Georgia Tech 
Title: Turbulence.zip Abstract: Turbulence is THE unsolved problem of classical physics. For two centuries we have had the equations that describe the motion of fluids, but we cannot solve them where we need them. For pipe, channel and plane flows for long time intervals, on large spatial domains, turbulent instabilities preclude any accurate numerical time integration. However, recent progress in `compressing’ turbulence data by equationassisted thinking, in terms of socalled `exact coherent structures’ suggests a radically different approach. The way we perceive turbulence the mere fact one can identify a cloud in a snapshot suggests these terabytes should be zipped into small labelled files, a label for each pattern explored by turbulence, and a graph of transitions among them. This pattern recognition problem is exceptionally constrained by the exact differential equations that the data must respect. Here the NavierStokes equations are recast as a spacetime theory, with both space and time taken to infinity, the traditional Direct Numerical Simulation codes have to be abandoned. In this theory there is no time, there is only a repertoire of admissible spatiotemporal patterns. To determine these, radically different kinds of codes will have to be written, with space and time treated on equal footing. 
9:50 – 10:35am 
Stephan Hoyer, Google Research 
Title: Data Driven Discretization for Partial Differential Equations Abstract: Machine learning and differentiable programming offer a new paradigm for scientific computing, with algorithms tuned by machines instead of by people. I’ll show how these approaches can be used to improve the heuristics underlying numerical methods, particularly for discretizing partial differential equations. We use high resolution simulations to create training data, which we train convolutional neural nets to emulate on much coarser grids. By building on top of traditional approaches such as finite volume schemes, we can incorporate physical constraints, ensure stability and allow for extracting physical insights. Our approach allows us to integrate in time a collection of nonlinear equations in one spatial dimension at resolutions 48x coarser than is possible with standard finite difference methods. Joint work with Yohai BarSinai (Harvard), Jason Hickey (Google) and Michael P. Brenner (Harvard). 
10:40 – 11:10am 
Coffee Break 

11:10 – 11:55am 
Jacob Page, Cambridge University 
Title: Augmenting the search for unstable periodic orbits in turbulence with autoencoders
Abstract: Unstable periodic orbits (UPOs) are the building blocks of chaotic attractors and possibly turbulent attractors too. The current stateoftheart for finding UPOs in a turbulent flow begins with a search for `near recurrences’ in a DNS time series, measured as local minima in an $l_2$norm between snapshots of the flow. The approach is crude and struggles to identify UPOs which are visited only fleetingly or which may be spatially localised. In this work we explore the use of convolutional neural networks (CNNs) as a means of performing a dimensionality reduction that respects the existence of UPOs and which can then be applied as a tool for efficiently identifying these coherent structures in turbulent data streams. We train a CNN in the form of an autoencoder to reconstruct snapshots of turbulent Kolmogorov flow (bodyforced Navier Stokes equations on a 2torus) at $Re=40$. The autoencoder reduces the dimensionality of the flow by orders of magnitude while its output is largely indistinguishable from the true turbulence. The network naturally develops an embedding of the continuous translational symmetry in the system, and we exploit this fact to define translationindependent observables of encoded vorticity fields. These observables can be used as a visualisation tool for comparing encoded UPOs, which cluster into distinct families of coherent structures with different dynamic features. The suggestion that the network has learnt a dimensionality reduction that is related to the exact coherent structures is confirmed by performing a recurrent flow analysis on encoded time series using the translationindependent observable. The approach results in the identification of an order of magnitude more UPOs as compared to a standard recurrent flow analysis over the same time interval. We will go on to assess the network’s performance at higher Reynolds numbers, where only a handful of exact coherent structures have been previously identified.
Joint work with Rich Kerswell and Michael Brenner. 
12:00 – 2:00pm 
Lunch Break 

2:00 – 2:45pm 
Houman Owhadi, Caltech 
Title: On the interface between Numerical Approximation, Inference and Learning Abstract: Although numerical approximation, statistical inference and learning are traditionally seen as entirely separate subjects, they are intimately connected through the common purpose of making estimations with partial information. This talk is an invitation to explore these connections from the consolidating perspective of game/decision theory and it is motivated by the suggestion that these confluences might not just be objects of curiosity but can constitute a pathway to simple solutions to fundamental problems in all three areas. We will illustrate this point through problems related to numerical homogenization, operator adapted wavelets, computation with dense kernel matrices and to the kernel selection/design problem in Machine Learning. In these interplays, accurate reduced/multiscale models (for PDEs) can be identified as optimal bets for adversarial games describing the process of computing with partial information. Moreover, efficient kernels (for ML) can be selected by using relative energy content at fine scales (with a notion of scale corresponding to the number of data points) as an ordering criterion leading to the identification of (data driven) flows in kernel spaces (Kernel Flows), that (1) enable the design of bottomless networks amenable to some degree of analysis (2) appear to converge towards kernels with good generalization properties. This talk will cover joint work with F. Schäfer, C. Scovel, T. Sullivan, G. R. Yoo and L. Zhang 
2:50 – 3:35pm 
Zuowei Shen, National University of Singapore 
Title: Deep Learning: Approximation of functions by composition
Abstract: The primary task in supervised learning is approximating/estimating a function f through samples drawn from a probability distribution on the input space. Learning is the process of approximating f by a function g with tunable parameters, which can be adjusted so that g becomes close to f in some averaged sense with respect to the input distribution. Usually, we pick a nice g to work with. For regression problems, the simplest g one can consider is an affine function, whose parameters can be fitted. For classification problems, one can consider g an affine function followed by a sigmoid transformation and an maximization across output coordinates. This is known as logistic regression. Although such simple g’s are easy to analyze and optimize in practice, when the underlying f is complex, they tend to have low approximation quality. The key idea in deep learning is to expand a simple approximator g by composing with it a series of nested feature extractors, i.e. one finds T1, T2, . . . , TN such that g composes with T1, T2,…, TN that approximates f . In this talk, we discuss mathematical theory behind such approximations and how the theory can be used to understand and design deep learning network; and how it differs from the classic approximation theory.

3:40 – 4:10pm 
Coffee Break 

4:10 – 4:55pm 
Lexing Ying, Stanford University & Facebook Research 
Title: Neural networks and inverse problems Abstract: This talk will discuss some recent progress on solving inverse problems using deep neural networks. Compared to standard vision and language applications, applications from inverse problems are often limited by the size of the training data set. In this talk, we will show how to overcome this issue by incorporating physical insights and mathematical analysis into the design of neural network architectures. In this talk, we will discuss a few applications, ranging from seismic inversion to electrical impedance tomography. In each case, we propose a novel neural network structure that allows for efficient training and compact representation of the forward and inverse operators. 
Time 
Speaker 
Title/Abstract 
9:00 – 9:45am 
Pengchuan Zhang, Microsoft Research 
Title: Multiscale deep generative networks for Bayesian inverse problems Abstract: Deep generative networks have achieved great success in high dimensional density approximation, especially for approximating the distributions of natural images and languages. In this talk, I will first talk about my recent work on generating natural highresolution images from text descriptions, using multiscale deep generative networks and adversarial training. Then, we propose to train deep generative networks to approximate posterior distributions in Bayesian Inverse Problems (BIPs). To train deep generative networks in the BIP setting, we propose a class of methods that can be combined with any samplebased Bayesian inference algorithm and learn from incremental improvement between two consecutive steps of these samplebased methods. In our experiment, we compare the performance of our training methods when combined with different samplebased algorithms, including various MCMC algorithms, ensemble Kalman filter and Stein variational gradient descent. Our experiment results show promising results of applying deep generative networks to highdimensional BIPs. 
9:50 – 10:35am 
Thomas Y. Hou, Caltech 
Title: Solving multiscale problems and data classification with subsampled data by integrating PDE analysis with data science Abstract: In many practical applications, we often need to provide solutions to quantities of interest to a largescale problem but with only subsampled data and partial information of the physical model. Traditional PDE solvers cannot be used directly for this purpose. On the other hand, many powerful techniques have been developed in data science to represent and compress data for useful information with extreme efficiency and low computational complexities. A crucial factor for the success of these methods is to exploit some low rank or sparsity structures in these highdimensional data. In this talk, we will describe our recent effort in developing effective numerical methods to solve largescale physical or data science problems using only a small percentage of subsampled data and partial knowledge of the physical model. The PDE analysis will help reveal certain important solution structures so that we can use techniques from data science to give accurate approximations for those quantities of interest. In addition to solving multiscale physical problems using subsampled data, we will also describe some novel optimization methods to solve data classification problems. 
10:40 – 11:10am 
Coffee Break 

11:10 – 11:55am 
De Huang, Caltech 
Title: Some new convexity results for a family of random matrices with applications to data science Abstract: Motivated by the data classification problem and other data science applications, we study the convexity properties of a class of random matrices. More specifically, we use the operator interpolation techniques to generalize the Lieb’s concavity theorems and a series of trace inequalities. These new convexity results can be used to perform spectrum estimates for random matrices and provide new expectation estimates and tail bonds on partial spectrum sum of random matrices. This talk will mainly focus on the use of operator interpolation for proving a class of random matrix estimates. 