Machine Learning for Multiscale Model Reduction Workshop

The Machine Learning for Multiscale Model Reduction Workshop will take place on March 27-29, 2019. This is the second of two workshops organized by Michael BrennerShmuel Rubinstein, and Tom Hou.  The first, Fluid turbulence and Singularities of the Euler/ Navier Stokes equations, will take place on March 13-15, 2019. Both workshops will be held in room G10 of the CMSA, located at 20 Garden Street, Cambridge, MA

For a list of lodging options convenient to the Center, please visit our recommended lodgings page.

Please register here

Speakers:

Schedule:

Wednesday, March 27

Time

Speaker

Title/Abstract

9:00 – 9:05am

Opening Remarks by Professor S. T. Yau

 

9:05 – 9:50am

Stanley Osher, UCLA

Title: Partial Differential Equations, Nonconvex Optimization and Deep Neural Nets

Abstract: Recently, links between partial differential equations (PDEs) and DNNs have been established in several interesting directions. We used ideas from Hamilton-Jacobi (HJ) equations and control and differential games to improve training time, modify andimprove the training algorithm, We propose a very simple modification of gradient descent and stochastic gradient descent. We show that when applied to a variety of machine learning models including softmax regression, convolutional neural nets, generative Adversarial nets, and deep reinforcement learning, this very simple surrogate can dramatically reduce the variance and improve the accuracy of the generalization. The new algorithm, (which depends on one nonnegative parameter) when applied to non convex minimization, tends to avoid local minima.  We also present a simple connection between transport equations and deep residual nets, based on stochastic control. This connection enabled us to improve neural nets’ adversarial robustness and generalization accuracy. Again, the programming changes needed to do these improvements are minimal, in cost, complexity and effort.

joint work with many people, especially Bao Wang, Zuoqiang Shi and Adam Oberman

9:55 – 10:40am

George Karnidakis, Brown University

Title: Physics-Informed Neural Networks (PINNs) for solving stochastic and fractional PDEs

Abstract: In this talk, we will present a new approach to develop a data-driven, learning-based framework for predicting outcomes of physical systems and for discovering hidden physics from noisy data. We will introduce a deep learning approach based on neural networks (NNs) and generative adversarial networks (GANs). Unlike other approaches that rely on big data, here we “learn” from small data by exploiting the information provided by the physical conservation laws, which are used to obtain informative priors or regularize the neural networks. We will also make connections between GPR and NNs and discuss the new powerful concept of meta-learning. We will demonstrate the power of PINNs for several inverse problems in fluid mechanics, including wake flows and shock tube problems, where traditional methods fail due to lack of boundary and initial conditions.

10:45 – 11:15am

Coffee Break

 

11:15 – 12:00pm

Stephane Mallat, College de France

Title: Multiscale Model Reduction in Deep Convolutional Networks

Abstract: Learning without suffering from the curse of dimensionality requires to find high-dimensional regularity properties enabling to reduce dimensionality. This talk shows that deep convolutional neural network architectures take advantage of scale separation to learn models of reduced dimensionality. We introduce a mathematical framework where scale separations are perfromed with wavelet transforms. Dimensionality reduction takes advantage of prior knowledge on symmetries and is based on learning of sparse representations. Applications will be shown on image classification, regression of quantum molecular energies, modeling of non-Gaussian stochastic processes such as turbulences, and signal generations similar to GAN’s.

12:00 – 2:00pm

Lunch Break

 

2:00 – 2:45pm

Jinchao Xu, Penn State University

Title:  Deep Neural Network and Multigrid

Abstract:  In this talk, I will discuss relationship between some deep learning models and traditional algorithms such as finite element and multigrid methods. Such relationships can be used to study, explain and improve the model structures, mathematical properties and relevant training algorithms for deep neural networks.  I will report a class of new training algorithms that can be used to improve the efficiency of convolutional neural networks (CNN) by significantly reducing the redundancy of the model without losing accuracy. By combining multigrid and deep learning methodologies, I will present a unified model, known as MgNet, that simultaneously recovers some CNNs for image classification and multigrid methods for solving discretized PDEs.  MgNet can also be used to derive a new class of CNNs that mathematically unify many existing CNN models and computationally competitive, and it can also be used to design new multigrid methods for solving discretized partial differential equations.

2:50 – 3:35pm

Joan Bruna Estrach, the Courant Institute

Title: Global Convergence of Neuron Birth-Death Dynamics

Abstract:  Neural networks with a large number of parameters admit a mean-field description, which has recently served as a theoretical explanation for the favorable training properties of “overparameterized” models. In this regime, gradient descent obeys a deterministic partial differential equation (PDE) that converges to a globally optimal solution for networks with a single hidden layer under appropriate assumptions.

In this talk, we propose a non-local mass transport dynamics that leads to a modified PDE with the same minimizer. We implement this non-local dynamics as a stochastic neuronal birth-death process and we prove that it accelerates the rate of convergence in the mean-field limit. We subsequently realize this PDE with two classes of numerical schemes that converge to the mean-field equation, each of which can easily be implemented for neural networks with finite numbers of parameters. We illustrate our algorithms with two models to provide intuition for the mechanism through which convergence is accelerated.

Joint work with G. Rotskoff (NYU), S. Jelassi (Princeton) and E. Vanden-Eijnden (NYU).

3:40 – 4:10pm

Coffee Break

 

4:10 – 4:55pm

Jack Xin, UC Irvine

Title: Discrete Optimization in Gradient Based Deep Learning

Abstract: Discrete functions and operations often appear in deep neural networks for complexity reduction and information propagation. We consider two such scenarios. One is quantization which restricts the activation functions to piecewise constant and weights to a discrete set. The other is automated channel shuffling by learning

permutation matrices. Though classical gradients may vanish or may not  even exist, we show the success of coarse (large scale) gradient and an exact  Lipschitz continuous penalty in guiding the learning process on deep networks via image data and solvable analytical models.

 

Thursday, March 28

Time

Speaker

Title/Abstract

9:00 – 9:45am

Predrag Cvitanovic, Georgia Tech

Title: Turbulence.zip

Abstract: Turbulence is THE unsolved problem of classical physics. For two centuries we have had the equations that describe the motion of fluids, but we cannot solve them where we need them. For pipe, channel and plane flows for long time intervals, on large spatial domains, turbulent instabilities preclude any accurate numerical time integration.

However, recent progress in `compressing’ turbulence data by equation-assisted thinking, in terms of so-called `exact coherent structures’ suggests a radically different approach. The way we perceive turbulence -the mere fact one can identify a cloud in a snapshot- suggests these terabytes should be zipped into small labelled files, a label for each pattern explored by turbulence, and a graph of transitions among them. This pattern recognition problem is exceptionally constrained by the exact differential equations that the data must respect.

Here the Navier-Stokes equations are recast as a space-time theory, with both space and time taken to infinity, the traditional Direct Numerical Simulation codes have to be abandoned. In this theory there is no time, there is only a repertoire of admissible spatiotemporal patterns. To determine these, radically different kinds of codes will have to be written, with space and time treated on equal footing.

9:50 – 10:35am

Stephan Hoyer, Google Research

Title: Data Driven Discretization for Partial Differential Equations

Abstract: Machine learning and differentiable programming offer a new paradigm for scientific computing, with algorithms tuned by machines instead of by people. I’ll show how these approaches can be used to improve the heuristics underlying numerical methods, particularly for discretizing partial differential equations. We use high resolution simulations to create training data, which we train convolutional neural nets to emulate on much coarser grids. By building on top of traditional approaches such as finite volume schemes, we can incorporate physical constraints, ensure stability and allow for extracting physical insights. Our approach allows us to integrate in time a collection of nonlinear equations in one spatial dimension at resolutions 4-8x coarser than is possible with standard finite difference methods.

Joint work with Yohai Bar-Sinai (Harvard), Jason Hickey (Google) and Michael P. Brenner (Harvard).

10:40 – 11:10am

Coffee Break

 

11:10 – 11:55am

Jacob Page, Cambridge University

Title: Augmenting the search for unstable periodic orbits in turbulence with autoencoders

Abstract: Unstable periodic orbits (UPOs) are the building blocks of chaotic attractors and possibly turbulent attractors too. The current state-of-the-art for finding UPOs in a turbulent flow begins with a search for `near recurrences’ in a DNS time series, measured as local minima in an $l_2$-norm between snapshots of the flow. The approach is crude and struggles to identify UPOs which are visited only fleetingly or which may be spatially localised. In this work we explore the use of convolutional neural networks (CNNs) as a means of performing a dimensionality reduction that respects the existence of UPOs and which can then be applied as a tool for efficiently identifying these coherent structures in turbulent data streams. We train a CNN in the form of an autoencoder to reconstruct snapshots of turbulent Kolmogorov flow (body-forced Navier Stokes equations on a 2-torus) at $Re=40$. The autoencoder reduces the dimensionality of the flow by orders of magnitude while its output is largely indistinguishable from the true turbulence. The network naturally develops an embedding of the continuous translational symmetry in the system, and we exploit this fact to define translation-independent observables of encoded vorticity fields. These observables can be used as a visualisation tool for comparing encoded UPOs, which cluster into distinct families of coherent structures with different dynamic features. The suggestion that the network has learnt a dimensionality reduction that is related to the exact coherent structures is confirmed by performing a recurrent flow analysis on encoded time series using the translation-independent observable. The approach results in the identification of an order of magnitude more UPOs as compared to a standard recurrent flow analysis over the same time interval. We will go on to assess the network’s performance at higher Reynolds numbers, where only a handful of exact coherent structures have been previously identified.

Joint work with Rich Kerswell and Michael Brenner.

12:00 – 2:00pm

Lunch Break

 

2:00 – 2:45pm

Houman Owhadi, Caltech

Title: On the interface between Numerical Approximation, Inference and Learning

Abstract: Although numerical approximation, statistical inference and learning  are traditionally seen as entirely separate subjects, they are intimately connected through the common purpose of making estimations with partial information.  This talk is an invitation to explore these connections from the consolidating perspective of game/decision theory and it is motivated by the suggestion that these confluences might not just be objects of curiosity but can constitute a pathway to simple solutions to fundamental problems in all three areas. We will illustrate this point through problems related to numerical homogenization, operator adapted wavelets, computation with dense kernel matrices and to the kernel selection/design problem in Machine Learning. In these interplays, accurate reduced/multiscale models (for PDEs) can be identified as optimal bets for adversarial games describing the process of computing with partial information. Moreover, efficient kernels (for ML) can be selected  by using relative energy content at fine scales (with a notion of scale corresponding to the number of data points) as an ordering criterion leading to the identification of (data driven) flows in kernel spaces (Kernel Flows), that (1) enable the design of bottomless networks amenable to some degree of analysis (2) appear to converge towards kernels with good generalization properties.

This talk will cover joint work with F. Schäfer, C. Scovel, T. Sullivan, G. R. Yoo and L. Zhang

2:50 – 3:35pm

Zuowei Shen, National University of Singapore

Title: Deep Learning: Approximation  of functions by composition

Abstract: The primary task in supervised learning is approximating/estimating  a function f through samples drawn from a probability distribution on the input space. Learning is the process of approximating f by a function g with tunable parameters, which can be adjusted so that g becomes close to f in some averaged sense with respect to the input distribution.  Usually, we pick a nice g to work with. For regression problems, the simplest g one can consider is an affine function, whose parameters can be fitted. For classification problems, one can consider g an affine function followed by a sigmoid transformation and an maximization across output coordinates. This is known as logistic regression. Although such simple g’s are easy to analyze and optimize in practice, when the underlying f is complex, they tend to have low approximation quality. The key idea in deep learning is to expand a simple approximator g by composing with it a series of nested feature extractors, i.e. one finds T1, T2, . . . , TN such that g composes with T1, T2,…, TN  that approximates f . In this talk, we discuss mathematical theory behind such approximations and how the theory can be used to understand and design deep learning network; and how it differs from the classic approximation theory.

3:40 – 4:10pm

Coffee Break

 

4:10 – 4:55pm

Lexing Ying, Stanford University & Facebook Research

Title: Neural networks and inverse problems

Abstract: This talk will discuss some recent progress on solving inverse problems using deep neural networks. Compared to standard vision and language applications, applications from inverse problems are often limited by the size of the training data set. In this talk, we will show how to overcome this issue by incorporating physical insights and mathematical analysis into the design of neural network architectures. In this talk, we will discuss a few applications, ranging from seismic inversion to electrical impedance tomography. In each case, we propose a novel neural network structure that allows for efficient training and compact representation of the forward and inverse operators.

Friday, March 29

Time

Speaker

Title/Abstract

9:00 – 9:45am

Pengchuan Zhang, Microsoft Research

Title: Multi-scale deep generative networks for Bayesian inverse problems

Abstract: Deep generative networks have achieved great success in high dimensional density approximation, especially for approximating the distributions of natural images and languages. In this talk, I will first talk about my recent work on generating natural high-resolution images from text descriptions, using multi-scale deep generative networks and adversarial training. Then, we propose to train deep generative networks to approximate posterior distributions in Bayesian Inverse Problems (BIPs). To train deep generative networks in the BIP setting, we propose a class of methods that can be combined with any sample-based Bayesian inference algorithm and learn from incremental improvement between two consecutive steps of these sample-based methods. In our experiment, we compare the performance of our training methods when combined with different sample-based algorithms, including various MCMC algorithms, ensemble Kalman filter and Stein variational gradient descent. Our experiment results show promising results of applying deep generative networks to high-dimensional BIPs.

9:50 – 10:35am

Thomas Y. Hou, Caltech

Title:  Solving multiscale problems and data classification with subsampled data by integrating PDE analysis with data science

Abstract: In many practical applications, we often need to provide solutions to quantities of interest to a large-scale problem but with only subsampled data and partial information of the physical model.  Traditional PDE solvers cannot be used directly for this purpose.  On the other hand, many powerful techniques have been developed in data science to represent and compress data for useful information with extreme efficiency and low computational complexities. A crucial factor for the success of these methods is to exploit some low rank or sparsity structures in these high-dimensional data. In this talk, we will describe our recent effort in developing effective numerical methods to solve large-scale physical or data science problems using only a small percentage of subsampled data and partial knowledge of the physical model. The PDE analysis will help reveal certain important solution structures so that we can use techniques from data science to give accurate approximations for those quantities of interest. In addition to solving multiscale physical problems using subsampled data,  we will also describe some novel optimization methods to solve data classification problems.

10:40 – 11:10am

Coffee Break

 

11:10 – 11:55am

De Huang, Caltech

Title: Some new convexity results for a family of random matrices with applications to data science

Abstract: Motivated by the data classification problem and other data science applications, we study the convexity properties of a class of random matrices. More specifically, we use the operator interpolation techniques to generalize the Lieb’s concavity theorems and a series of trace inequalities. These new convexity results can be used to perform spectrum estimates for random matrices and provide new expectation estimates and tail bonds on partial spectrum sum of random matrices. This talk will mainly focus on the use of operator interpolation for proving a class of random matrix estimates.

 

Related Posts