Please click here to register for this event. We have space for up to 30 registrants on a first come, first serve basis.
We may be able to provide some financial support for grad students and postdocs interested in this event. If you are interested in funding, please send a letter of support from your mentor to Hansol Hong at hansol84@gmail.com.
Confirmed Speakers:
The schedule is as follows:
Thursday 4/4/2018
Time  Speaker  Title/Abstract 
12:001:30pm  Lunch  
1:302:30pm  Tristan Collins  
2:302:45pm  Break  
2:453:45pm  Dimitry Vaintrob  
3:454:15pm  Break  
4:155:15pm  Mandy Cheung 
Friday 4/5/2018
Time  Speaker  Title/Abstract 
9:00 – 9:30am  Breakfast  
9:3010:30 am  Zack Sylvan  
10:3011:00am  Break  
11:0012:00pm  Yu Pan  
12:001:30pm  Lunch  
1:302:30pm  YuShen Lin  
2:302:45pm  Break  
2:453:45pm  CheukYu Mak  
3:454:15pm  Break  
4:155:15pm  Yoosik Kim 
Saturday 4/6/2018
Time  Speaker  Title/Abstract 
9:009:30am  Breakfast  
9:3010:30am  Zack Sylvan  
10:3010:45am  Break  
10:4511:45am  ShuHeng Shao  
11:4512:00pm  Break  
12:001:00pm  Mauricio Romo 
The organizing committee consists of Yang Wang (HKUST), Ronald Lui (CUHK), David Gu (Stony Brook), and ShingTung Yau (Harvard).
Please click here to register for the event.
Confirmed Speakers:
This event is supported by the CMSA and the NSF.
Schedule:
Saturday, March 24
Time  Speaker  Title/Abstract 
9:009:30am  Breakfast & opening speech  
9:3010:30am  Stephen Wong  
10:3011:00am  Break  
11:0012:00pm  Lakshminarayanan Mahadevan  
12:001:30pm  Lunch  
1:302:30pm  Monica Hurdal  
2:302:45pm  Break  
2:453:45pm  Allen Tannenbaum  
3:454:15pm  Break  
4:155:15pm  David Gu 
Sunday, March 25
Time  Speaker  Title/Abstract 
9:009:30am  Breakfast  
9:3010:30am  Hongkai Zhao  
10:3011:00am  Break  
11:0012:00pm  Guowei Wei  
12:001:30pm  Lunch  
1:302:30pm  Monica Hurdal  
2:302:45pm  Break  
2:453:45pm  Yue Lu  
3:454:15pm  Break  
4:155:15pm  Jianfeng Cai 
Monday, March 26
Time  Speaker  Title/Abstract 
9:009:30am  Breakfast  
9:3010:30am  Jun Zhang  
10:3011:00am  Break  
11:0012:00pm  Eric Miller  
12:001:30pm  Lunch  
1:302:30pm  Jerome Darbon  
2:302:45pm  Break  
2:453:45pm  Rongjie Lai  
3:454:15pm  Break  
4:155:15pm  Ronald Lui 
]]>
The schedule below will be updated as speakers are confirmed.
Date…………  Speaker  Title 
02092018 *Friday  Fan Chung
(UCSD) 
Sequences: random, structured or something in between
There are many fundamental problems concerning sequences that arise in many areas of mathematics and computation. Typical problems include finding or avoiding patterns; testing or validating various `randomlike’ behavior; analyzing or comparing different statistics, etc. In this talk, we will examine various notions of regularity or irregularity for sequences and mention numerous open problems. 
02142018  Zhengwei Liu
(Harvard Physics) 
A new program on quantum subgroups
Abstract: Quantum subgroups have been studied since the 1980s. The A, D, E classification of subgroups of quantum SU(2) is a quantum analogue of the McKay correspondence. It turns out to be related to various areas in mathematics and physics. Inspired by the quantum McKay correspondence, we introduce a new program that our group at Harvard is developing. 
02212018  Don Rubin
(Harvard) 
Essential concepts of causal inference — a remarkable history
Abstract: I believe that a deep understanding of cause and effect, and how to estimate causal effects from data, complete with the associated mathematical notation and expressions, only evolved in the twentieth century. The crucial idea of randomized experiments was apparently first proposed in 1925 in the context of agricultural field trails but quickly moved to be applied also in studies of animal breeding and then in industrial manufacturing. The conceptual understanding seemed to be tied to ideas that were developing in quantum mechanics. The key ideas of randomized experiments evidently were not applied to studies of human beings until the 1950s, when such experiments began to be used in controlled medical trials, and then in social science — in education and economics. Humans are more complex than plants and animals, however, and with such trials came the attendant complexities of noncompliance with assigned treatment and the occurrence of “Hawthorne” and placebo effects. The formal application of the insights from earlier simpler experimental settings to more complex ones dealing with people, started in the 1970s and continue to this day, and include the bridging of classical mathematical ideas of experimentation, including fractional replication and geometrical formulations from the early twentieth century, with modern ideas that rely on powerful computing to implement aspects of design and analysis. 
02262018 *Monday  Tom Hou
(Caltech) 
Computerassisted analysis of singularity formation of a regularized 3D Euler equation
Abstract: Whether the 3D incompressible Euler equation can develop a singularity in finite time from smooth initial data is one of the most challenging problems in mathematical fluid dynamics. This question is closely related to the Clay Millennium Problem on 3D NavierStokes Equations. In a recent joint work with Dr. Guo Luo, we provided convincing numerical evidence that the 3D Euler equation develops finite time singularities. Inspired by this finding, we have recently developed an integrated analysis and computation strategy to analyze the finite time singularity of a regularized 3D Euler equation. We first transform the regularized 3D Euler equation into an equivalent dynamic rescaling formulation. We then study the stability of an approximate selfsimilar solution. By designing an appropriate functional space and decomposing the solution into a low frequency part and a high frequency part, we prove nonlinear stability of the dynamic rescaling equation around the approximate selfsimilar solution, which implies the existence of the finite time blowup of the regularized 3D Euler equation. This is a joint work with Jiajie Chen, De Huang, and Dr. Pengfei Liu. 
03072018  Richard Kenyon
(Brown) 

03142018  
03212018  
03282018  Andrea Montanari (Stanford)  
03302018
*Friday* 3:004:15pm 

04042018  
04112018  
04182018  Washington Taylor
(MIT) 

04252018  
05022018  
05092018 
For information on previous CMSA colloquia, click here.
]]>The following speakers are confirmed:
Jointly organized by Harvard University, Massachusetts Institute of Technology, and Microsoft Research New England, the Charles River Lectures on Probability and Related Topics is a oneday event for the benefit of the greater Boston area mathematics community.
The 2017 lectures will take place 9:15am – 5:30pm on Monday, October 2 at Harvard University in the Harvard Science Center.
***************************************************
**************************************************
Please note that registration has closed.
In Harvard Science Center Hall C:
8:45 am – 9:15 am: Coffee/light breakfast
9:15 am – 10:15 am: Ofer Zeitouni
Title:
Abstract:
10:20 am – 11:20 am: Andrea Montanari
Title:
Abstract:
11:20 am – 11:45 am: Break
11:45 am – 12:45 pm: Paul Bourgade
Title:
Abstract:
1:00 pm – 2:30 pm: Lunch
In Harvard Science Center Hall E:
2:45 pm – 3:45 pm: Roman Vershynin
Title: Deviations of random matrices and applications
Abstract: Uniform laws of large numbers provide theoretical foundations for statistical learning theory. This lecture will focus on quantitative uniform laws of large numbers for random matrices. A range of illustrations will be given in high dimensional geometry and data science.
3:45 pm – 4:15 pm: Break
4:15 pm – 5:15 pm: Massimiliano Gubinelli
Title:
Abstract:
Alexei Borodin, Henry Cohn, Vadim Gorin, Elchanan Mossel, Philippe Rigollet, Scott Sheffield, and H.T. Yau
]]>Please click here to register for this event. We have space for up to 30 registrants on a first come, first serve basis.
Confirmed Participants:
Wednesday, January 10
Time  Speaker  Title/Abstract 
9:3010:30am  Tony Pantev  Homological Mirror Symmetry and the mirror map for del Pezzo surfaces
Abstract: I will discuss the general mirror symmetry question for 
10:30 – 11:00am  Break  
11:00 – 12:00pm  YoungHoon Kiem  Knorrer periodicity in curve counting
Abstract: The derived Knorrer periodicity compares the derived category of coherent sheaves on a projective hypersurface with that of matrix factorizations of its defining equation. I’d like to talk about a parallel development in curve counting, including ChangLi’s pfield invariant, ChangLiLi’s algebraic theory of (narrow) FJRW invariant and PolishchukVaintrob’s cohomological field theory, from the viewpoint of cosection localization. 
12:00 – 1:45pm  Lunch  
1:45 – 2:45pm  Kaoru Ono  Antisymplectic involutions and twisted sectors in Langranian Floer theory
Abstract: After explaining some results in Lagrangian Floer theory in the presence of an antisymplectic involution, I will present a definition of twisted sectors, which is suitable for Lagrangian Floer theory in orbifold setting. The first part is based on joint works with K. Fukaya, Y.G. Oh and H. Ohta and the second is based on a joint work (in progress) with B. Chen and B.L. Wang. 
2:45 – 3:15pm  Tea  
3:15 – 4:15pm  Radu Laza  Some remarks on degenerations of Ktrivial varieties
Abstract A fundamental result for K3 surfaces is the KulikovPerssonPinkham theorem on degenerations of K3 surfaces. In this talk, I will explore higher di mensional analogues of it and potential applications. Specifically, as a consequence of the minimal model program, Fujino has a obtained a weak analogue of the KPP Theorem for Ktrivial varieties. I will then discuss some relationships between the dual complex of the central fiber and the monodromy of the degenerations. I will then explain some consequences of this for Hyperkaehler manifolds and CalabiYau 3folds. 
Thursday, January 11
Time  Speaker  Title/Abstract 
9:3010:30am  Yan Soibelman 
RiemannHilbert correspondence in dimension one, Fukaya categories and periodic monopoles Abstract: By RHcorrespondence in dimension one I understand not only the classical one for holonomic Dmodules on curves, but also its versions for qdifference and elliptic difference equations. The unifying geometry for all versions is the one of partially compactified symplectic surfaces. Then the RHcorrespondence relates the category of holonomic coherent sheaves on the quantized symplectic surface with an appropriate partially wrapped Fukaya category of that surface. The nonabelian Hogde theory in dimension one deals with twistor families of the parabolic versions of the above categories. In the case of qdifference equations the role of harmonic objects is played by doubly periodic monopoles, while in the case of elliptic difference equations it is played by triply periodic monopoles. Talk is based on the joint project with Maxim Kontsevich.

10:30 – 11:00am  Break  
11:00 – 12:00pm  CheolHyun Cho  Gluing localized mirror functors.
Abstract: Given a Lagrangian submanifold L, we can consider a formal deformation theory of $L$ which is developed by FukayaOhOhtaOno. This provides a local mirror (with respect to L), given by the Lagrangian Floer potential function on the formal MaurerCartan space of L. Then, we can canonically construct a localized mirror functor from Fukaya category to the matrix factorization category. Given two different Lagrangian submanifolds, we explain how to glue these local mirrors to obtain a global mirror model, and also how to glue their localized mirror functors to obtain a global version of homological mirror functor. This is a joint work in progress with Hansol Hong and SiuCheong Lau. 
12:00 – 1:45pm  Lunch  
1:45 – 2:45pm  Mohammed Abouzaid  
2:45 – 3:15pm  Tea  
3:15 – 4:15pm  SiuCheong Lau  Immersed Lagrangians and wallcrossing
Abstract: We find the Floertheoretical gluing between local moduli of Lagrangian immersions, and use it to study wallcrossing for local CalabiYau manifolds. It is a joint work with Cho and Hong. In a joint work with Hong and Kim, we apply the technique to recover the Lie theoretical mirror of Gr(2,n). 
Friday, January 12
Time  Speaker  Title/Abstract 
9:3010:30am  Eric Zaslow  Framing Duality
Abstract: A symmetric quiver with g nodes is described by a symmetric adjacency matrix of size g. The same data defines a “framing” of a certain genusg Legendrian surface in the fivesphere, and the invariants of the quiver conjecturally relate to the open GromovWitten (GW) invariants of a nonexact Lagrangian filling of the surface. (Physically, both data count the same BPS states but from different perspectives.) Further, cluster theory can be exploited to conjecturally obtain all open GW invariants of Lagrangian fillings of a wider class of Legendrian surfaces described by cubic planar graphs.
In this talk, I will describe these observations, which build on prior work of others and are explored in joint works with David Treumann and Linhui Shen. 
10:30 – 11:00am  Break  
11:00 – 12:00pm  Si Li  CalabiYau geometry, KodairaSpencer gravity and integrable hierarchy
Abstract: We discuss some physical and geometric aspects of KodairaSpencer gravity (BCOV theory) on CalabiYau geometry and explain how quantum master equation leads to integrable hierarchies 
12:00 – 1:45pm  Lunch  
1:45 – 2:45pm  Sergueï Barannikov  Quantum master equation on cyclic cochains and categorical higher genus GromovWitten invariants
The construction of cohomology classes in the compactified moduli spaces of curves based on the quantum master equation on cyclic cochains will be reviewed. For the simplest category consisting of one object with only the identity morphism it produces the generating function for products of the psiclasses. The talk is based on the speaker’s works “Modular operads and BatalinVilkovisky geometry” (MPIM Bonn preprint 200648 (04/2006)) and “Noncommutative Batalin–Vilkovisky geometry and matrix integrals” (preprint Hal00102085 (09/2006)). 
2:45 – 3:15pm  Tea  
3:15 – 4:15pm  Thomas Lam  Mirror symmetry for flag varieties via the Langlands program
Abstract: I will talk about a mirror theorem for minuscule flag 
4:15 – 4:30pm  Break  
4:30 – 5:30pm  Colleen Robles

Generalizing the SatakeBailyBorel compactification.
Abstract: The SatakeBailyBorel (SBB) compactification is an projective algebraic completion of a locally Hermitian symmetric space. This construction, along with Borel’s Extension Theorem, provides the conduit to apply Hodge theory to study the moduli spaces (and their compactifications) of principally polarized abelian varieties and K3 surfaces. Most period domains are not Hermitian, and so one would like to generalize SBB in the hopes of similarly applying Hodge theory to study the moduli spaces (and their compactifications) of more general classes of algebraic varieties. In this talk I will present one such generalization. This work joint work with M. Green, P. Griffiths and R. Laza. 
Saturday, January 13
Time  Speaker  Title/Abstract 
9:3010:30am  Chenglong Yu  Higher HasseWitt matrices and period integrals
Abstract: I shall explain a program to relate the arithmetic of CalabiYau hypersurfaces in toric varieties or flag varieties, to their period integrals at the large complex structure limit. In particular, we prove a recent conjecture of Vlasenko regarding higher HasseWitt matrices. This work follows Katz’s description of Frobenius action in terms of local expansions. It is joint work with Huang, Lian and Yau.

10:30 – 11:00am  Break  
11:00 – 12:00pm  Kazushi Ueda  Moduli of K3 surfaces as moduli of Ainfinity structures
Abstract: We give a description of the moduli space of K3 surfaces polarized 
]]>
The lectures will take place from 4:305:30pm in Science Center, Hall D.
]]>
The Center of Mathematic Sciences and Applications will host a conference on From Algebraic Geometry to Vision and AI: A Symposium Celebrating the Mathematical Work of David Mumford. The event will be held in the Harvard Science Center, Hall D.
For a list of lodging options convenient to the Center, please visit our recommended lodgings page.
Confirmed speakers and panelists:
*This event is jointly supported by the CMSA, National Science Foundation, and the International Science Foundation of Cambridge.
]]>
The Big Data Conference features many speakers from the Harvard community as well as scholars from across the globe, with talks focusing on computer science, statistics, math and physics, and economics. This is the third conference on Big Data the Center will host as part of our annual events, and is coorganized by Richard Freeman, Scott Kominers, Jun Liu, HorngTzer Yau and ShingTung Yau.
For a list of lodging options convenient to the Center, please visit our recommended lodgings page.
Please note that lunch will not be provided during the conference, but a map of Harvard Square with a list of local restaurants can be found by clicking Map & Restaurants.
Confirmed Speakers:
Following the conference, there will be a twoday workshop from August 2021. The workshop is organized by Scott Kominers, and will feature:
Conference Schedule
A PDF version of the schedule below can also be downloaded here.
Time  Speaker  Topic 
8:30 am – 9:00 am  Breakfast  
9:00 am – 9:40 am  Mohammad Akbarpour  Title: Information aggregation in overlapping generations and the emergence of experts
Abstract: We study a model of social learning with “overlapping generations”, where agents meet others and share data about an underlying state over time. We examine under what conditions the society will produce individuals with precise knowledge about the state of the world. There are two information sharing regimes in our model: Under the full information sharing technology, individuals exchange the information about their point estimates of an underlying state, as well as their sources (or the precision of their signals) and update their beliefs by taking a weighted average. Under the limited information sharing technology, agents only observe the information about the point estimates of those they meet, and update their beliefs by taking a weighted average, where weights can depend on the sequence of meetings, as well as the labels. Our main result shows that, unlike most social learning settings, using such linear learning rules do not guide the society (or even a fraction of its members) to learn the truth, and having access to, and exploiting knowledge of the precision of a source signal are essential for efficient social learning (joint with Amin Saberi & Ali Shameli). 
9:40 am – 10:20 am  Lucas Janson  Title: ModelFree Knockoffs For HighDimensional Controlled Variable Selection
Abstract: Many contemporary largescale applications involve building interpretable models linking a large set of potential covariates to a response in a nonlinear fashion, such as when the response is binary. Although this modeling problem has been extensively studied, it remains unclear how to effectively control the fraction of false discoveries even in highdimensional logistic regression, not to mention general highdimensional nonlinear models. To address such a practical problem, we propose a new framework of modelfree knockoffs, which reads from a different perspective the knockoff procedure (Barber and Candès, 2015) originally designed for controlling the false discovery rate in linear models. The key innovation of our method is to construct knockoff variables probabilistically instead of geometrically. This enables modelfree knockoffs to deal with arbitrary (and unknown) conditional models and any dimensions, including when the dimensionality p exceeds the sample size n, while the original knockoffs procedure is constrained to homoscedastic linear models with n greater than or equal to p. Our approach requires the design matrix be random (independent and identically distributed rows) with a covariate distribution that is known, although we show our procedure to be robust to unknown/estimated distributions. As we require no knowledge/assumptions about the conditional distribution of the response, we effectively shift the burden of knowledge from the response to the covariates, in contrast to the canonical modelbased approach which assumes a parametric model for the response but very little about the covariates. To our knowledge, no other procedure solves the controlled variable selection problem in such generality, but in the restricted settings where competitors exist, we demonstrate the superior power of knockoffs through simulations. Finally, we apply our procedure to data from a casecontrol study of Crohn’s disease in the United Kingdom, making twice as many discoveries as the original analysis of the same data. 
10:20 am – 10:50 am  Break  
10:50 pm – 11:30 pm  Noureddine El Karoui  Title: Random matrices and highdimensional statistics: beyond covariance matrices
Abstract: Random matrices have played a central role in understanding very important statistical methods linked to covariance matrices (such as Principal Components Analysis, Canonical Correlation Analysis etc…) for several decades. In this talk, I’ll show that one can adopt a randommatrixinspired point of view to understand the performance of other widely used tools in statistics, such as Mestimators, and very common methods such as the bootstrap. I will focus on the highdimensional case, which captures well the situation of “moderately” difficult statistical problems, arguably one of the most relevant in practice. In this setting, I will show that random matrix ideas help upend conventional theoretical thinking (for instance about maximum likelihood methods) and highlight very serious practical problems with resampling methods. 
11:30 am – 12:10 pm  Nikhil Naik  Title: Understanding Urban Change with Computer Vision and Streetlevel Imagery
Abstract: Which neighborhoods experience physical improvements? In this work, we introduce a computer vision method to measure changes in the physical appearances of neighborhoods from timeseries streetlevel imagery. We connect changes in the physical appearance of five US cities with economic and demographic data and find three factors that predict neighborhood improvement. First, neighborhoods that are densely populated by collegeeducated adults are more likely to experience physical improvements. Second, neighborhoods with better initial appearances experience, on average, larger positive improvements. Third, neighborhood improvement correlates positively with physical proximity to the central business district and to other physically attractive neighborhoods. Together, our results illustrate the value of using computer vision methods and streetlevel imagery to understand the physical dynamics of cities. (Joint work with Edward L. Glaeser, Cesar A. Hidalgo, Scott Duke Kominers, and Ramesh Raskar.) 
12:10 pm – 12:25 pm  Video #1  Data Science Lightning Talks 
12:25 pm – 1:30 pm  Lunch  
1:30 pm – 2:10 pm  Tracy Ke  Title: A new SVD approach to optimal topic estimation
Abstract: In the probabilistic topic models, the quantity of interest—a lowrank matrix consisting of topic vectors—is hidden in the text corpus matrix, masked by noise, and Singular Value Decomposition (SVD) is a potentially useful tool for learning such a lowrank matrix. However, the connection between this lowrank matrix and the singular vectors of the text corpus matrix are usually complicated and hard to spell out, so how to use SVD for learning topic models faces challenges. We overcome the challenge by revealing a surprising insight: there is a lowdimensional simplex structure which can be viewed as a bridge between the lowrank matrix of interest and the SVD of the text corpus matrix, and which allows us to conveniently reconstruct the former using the latter. Such an insight motivates a new SVDbased approach to learning topic models. For asymptotic analysis, we show that under a popular topic model (Hofmann, 1999), the convergence rate of the l1error of our method matches that of the minimax lower bound, up to a multilogarithmic term. In showing these results, we have derived new elementwise bounds on the singular vectors and several large deviation bounds for weakly dependent multinomial data. Our results on the convergence rate and asymptotical minimaxity are new. We have applied our method to two data sets, Associated Process (AP) and Statistics Literature Abstract (SLA), with encouraging results. In particular, there is a clear simplex structure associated with the SVD of the data matrices, which largely validates our discovery. 
2:10 pm – 2:50 pm  AlbertLászló Barabási  Title: Taming Complexity: From Network Science to Controlling Networks
Abstract: The ultimate proof of our understanding of biological or technological systems is reflected in our ability to control them. While control theory offers mathematical tools to steer engineered and natural systems towards a desired state, we lack a framework to control complex selforganized systems. Here we explore the controllability of an arbitrary complex network, identifying the set of driver nodes whose timedependent control can guide the system’s entire dynamics. We apply these tools to several real networks, unveiling how the network topology determines its controllability. Virtually all technological and biological networks must be able to control their internal processes. Given that, issues related to control deeply shape the topology and the vulnerability of real systems. Consequently unveiling the control principles of real networks, the goal of our research, forces us to address series of fundamental questions pertaining to our understanding of complex systems.

2:50 pm – 3:20 pm  Break  
3:20 pm – 4:00 pm  Marena Lin  Title: Optimizing climate variables for human impact studies
Abstract: Estimates of the relationship between climate variability and socioeconomic outcomes are often limited by the spatial resolution of the data. As studies aim to generalize the connection between climate and socioeconomic outcomes across countries, the best available socioeconomic data is at the national level (e.g. food production quantities, the incidence of warfare, averages of crime incidence, gender birth ratios). While these statistics may be trusted from government censuses, the appropriate metric for the corresponding climate or weather for a given year in a country is less obvious. For example, how do we estimate the temperatures in a country relevant to national food production and therefore food security? We demonstrate that highresolution spatiotemporal satellite data for vegetation can be used to estimate the weather variables that may be most relevant to food security and related socioeconomic outcomes. In particular, satellite proxies for vegetation over the African continent reflect the seasonal movement of the Intertropical Convergence Zone, a band of intense convection and rainfall. We also show that agricultural sensitivity to climate variability differs significantly between countries. This work is an example of the ways in which insitu and satellitebased observations are invaluable to both estimates of future climate variability and to continued monitoring of the earthhuman system. We discuss the current state of these records and potential challenges to their continuity. 
4:00 pm – 4:40 pm  Alex Peysakhovich  Title: Building a cooperator
Abstract: A major goal of modern AI is to construct agents that can perform complex tasks. Much of this work deals with single agent decision problems. However, agents are rarely alone in the world. In this talk I will discuss how to combine ideas from deep reinforcement learning and game theory to construct artificial agents that can communicate, collaborate and cooperate in productive positive sum interactions. 
4:40 pm – 5:20 pm  Tze Leung Lai  Title: Gradient boosting: Its role in big data analytics, underlying mathematical theory, and recent refinements
Abstract: We begin with a review of the history of gradient boosting, dating back to the LMS algorithm of Widrow and Hoff in 1960 and culminating in Freund and Schapire’s AdaBoost and Friedman’s gradient boosting and stochastic gradient boosting algorithms in the period 19992002 that heralded the big data era. The role played by gradient boosting in big data analytics, particularly with respect to deep learning, is then discussed. We also present some recent work on the mathematical theory of gradient boosting, which has led to some refinements that greatly improves the convergence properties and prediction performance of the methodology. 
Time  Speaker  Topic 
8:30 am – 9:00 am  Breakfast  
9:00 am – 9:40 am  Natesh Pillai  Title: Accelerating MCMC algorithms for Computationally Intensive Models via Local Approximations
Abstract: We construct a new framework for accelerating Markov chain Monte Carlo in posterior sampling problems where standard methods are limited by the computational cost of the likelihood, or of numerical models embedded therein. Our approach introduces local approximations of these models into the Metropolis–Hastings kernel, borrowing ideas from deterministic approximation theory, optimization, and experimental design. Previous efforts at integrating approximate models into inference typically sacrifice either the sampler’s exactness or efficiency; our work seeks to address these limitations by exploiting useful convergence characteristics of local approximations. We prove the ergodicity of our approximate Markov chain, showing that it samples asymptotically from the exact posterior distribution of interest. We describe variations of the algorithm that employ either local polynomial approximations or local Gaussian process regressors. Our theoretical results reinforce the key observation underlying this article: when the likelihood has some local regularity, the number of model evaluations per Markov chain Monte Carlo (MCMC) step can be greatly reduced without biasing the Monte Carlo average. Numerical experiments demonstrate multiple orderofmagnitude reductions in the number of forward model evaluations used in representative ordinary differential equation (ODE) and partial differential equation (PDE) inference problems, with both synthetic and real data. 
9:40 am – 10:20 am  Ravi Jagadeesan  Title: Designs for estimating the treatment effect in networks with interference
Abstract: In this paper we introduce new, easily implementable designs for drawing causal inference from randomized experiments on networks with interference. Inspired by the idea of matching in observational studies, we introduce the notion of considering a treatment assignment as a quasicoloring” on a graph. Our idea of a perfect quasicoloring strives to match every treated unit on a given network with a distinct control unit that has identical number of treated and control neighbors. For a wide range of interference functions encountered in applications, we show both by theory and simulations that the classical Neymanian estimator for the direct effect has desirable properties for our designs. This further extends to settings where homophily is present in addition to interference. 
10:20 am – 10:50 am  Break  
10:50 am – 11:30 am  Annie Liang  Title: The Theory is Predictive, but is it Complete? An Application to Human Generation of Randomness
Abstract: When we test a theory using data, it is common to focus on correctness: do the predictions of the theory match what we see in the data? But we also care about completeness: how much of the predictable variation in the data is captured by the theory? This question is difficult to answer, because in general we do not know how much “predictable variation” there is in the problem. In this paper, we consider approaches motivated by machine learning algorithms as a means of constructing a benchmark for the best attainable level of prediction. We illustrate our methods on the task of predicting humangenerated random sequences. Relative to a theoretical machine learning algorithm benchmark, we find that existing behavioral models explain roughly 15 percent of the predictable variation in this problem. This fraction is robust across several variations on the problem. We also consider a version of this approach for analyzing field data from domains in which human perception and generation of randomness has been used as a conceptual framework; these include sequential decisionmaking and repeated zerosum games. In these domains, our framework for testing the completeness of theories provides a way of assessing their effectiveness over different contexts; we find that despite some differences, the existing theories are fairly stable across our field domains in their performance relative to the benchmark. Overall, our results indicate that (i) there is a significant amount of structure in this problem that existing models have yet to capture and (ii) there are rich domains in which machine learning may provide a viable approach to testing completeness (joint with Jon Kleinberg and Sendhil Mullainathan). 
11:30 am – 12:10 pm  Zak Stone  Title: TensorFlow: Machine Learning for Everyone
Abstract: We’ve witnessed extraordinary breakthroughs in machine learning over the past several years. What kinds of things are possible now that weren’t possible before? How are opensource platforms like TensorFlow and hardware platforms like GPUs and Cloud TPUs accelerating machine learning progress? If these tools are new to you, how should you get started? In this session, you’ll hear about all of this and more from Zak Stone, the Product Manager for TensorFlow on the Google Brain team. 
12:10 pm – 1:30 pm  Lunch  
1:30 pm – 2:10 pm  Jann Spiess  Title: (Machine) Learning to Control in Experiments
Abstract: Machine learning focuses on highquality prediction rather than on (unbiased) parameter estimation, limiting its direct use in typical program evaluation applications. Still, many estimation tasks have implicit prediction components. In this talk, I discuss accounting for controls in treatment effect estimation as a prediction problem. In a canonical linear regression framework with highdimensional controls, I argue that OLS is dominated by a natural shrinkage estimator even for unbiased estimation when treatment is random; suggest a generalization that relaxes some parametric assumptions; and contrast my results with that for another implicit prediction problem, namely the first stage of an instrumental variables regression. 
2:10 pm – 2:50 pm  Bradly Stadie  Title: Learning to Learn Quickly: OneShot Imitation and Meta Learning
Abstract: Many reinforcement learning algorithms are bottlenecked by data collection costs and the brittleness of their solutions when faced with novel scenarios. 
2:50 pm – 3:20 pm  Break  
3:20 pm – 4:00 pm  HauTieng Wu  Title: When Medical Challenges Meet Modern Data Science
Abstract: Adaptive acquisition of correct features from massive datasets is at the core of modern data analysis. One particular interest in medicine is the extraction of hidden dynamics from a single observed time series composed of multiple oscillatory signals, which could be viewed as a singlechannel blind source separation problem. The mathematical and statistical problems are made challenging by the structure of the signal which consists of nonsinusoidal oscillations with time varying amplitude/frequency, and by the heteroscedastic nature of the noise. In this talk, I will discuss recent progress in solving this kind of problem by combining the cepstrumbased nonlinear timefrequency analysis and manifold learning technique. A particular solution will be given along with its theoretical properties. I will also discuss the application of this method to two medical problems – (1) the extraction of a fetal ECG signal from a single lead maternal abdominal ECG signal; (2) the simultaneous extraction of the instantaneous heart/respiratory rate from a PPG signal during exercise; (3) (optional depending on time) an application to atrial fibrillation signals. If time permits, the clinical trial results will be discussed. 
4:00 pm – 4:40 pm  Sifan Zhou  Title: Citing People Like Me: Homophily, Knowledge Spillovers, and Continuing a Career in Science
Abstract: Forward citation is widely used to measure the scientific merits of articles. This research studies millions of journal article citation records in life sciences from MEDLINE and finds that authors of the same gender, the same ethnicity, sharing common collaborators, working in the same institution, or being geographically close are more likely (and quickly) to cite each other than predicted by their proportion among authors working on the same research topics. This phenomenon reveals how social and geographic distances influence the quantity and speed of knowledge spillovers. Given the importance of forward citations in academic evaluation system, citation homophily potentially put authors from minority group at a disadvantage. I then show how it influences scientists’ chances to survive in the academia and continue publishing. Based on joint work with Richard Freeman. 
To view photos and video interviews from the conference, please visit the CMSA blog.
]]>