This event has passed.

Big Data Conference 2021

Name: Big Data Conference 2021
Start: 2021-08-24T00:00:00-05:00
End: 2021-08-24T23:59:59-05:00
Location: Virtual

August 24, 2021

On August 24, 2021, the CMSA hosted our seventh annual Conference on Big Data. The Conference features many speakers from the Harvard community as well as scholars from across the globe, with talks focusing on computer science, statistics, math and physics, and economics.

The 2021 Big Data Conference took place virtually on Zoom.

Organizers:

Shing-Tung Yau, William Caspar Graustein Professor of Mathematics, Harvard University
Scott Duke Kominers, MBA Class of 1960 Associate Professor, Harvard Business
Horng-Tzer Yau, Professor of Mathematics, Harvard University
Sergiy Verstyuk, CMSA, Harvard University

Speakers:

Andrew Blumberg, University of Texas at Austin
Moran Koren, Harvard CMSA
Hima Lakkaraju, Harvard University
Katrina Ligett, The Hebrew University of Jerusalem

Big Data Conference 2021

Time (ET; Boston time)	Speaker	Title/Abstract
9:00AM	Conference Organizers	Introduction and Welcome
9:10AM – 9:55AM	Andrew Blumberg, University of Texas at Austin	Title: Robustness and stability for multidimensional persistent homology Abstract: A basic principle in topological data analysis is to study the shape of data by looking at multiscale homological invariants. The idea is to filter the data using a scale parameter that reflects feature size. However, for many data sets, it is very natural to consider multiple filtrations, for example coming from feature scale and density. A key question that arises is how such invariants behave with respect to noise and outliers. This talk will describe a framework for understanding those questions and explore open problems in the area.
10:00AM – 10:45AM	Katrina Ligett, The Hebrew University of Jerusalem	Title: Privacy as Stability, for Generalization Abstract: Many data analysis pipelines are adaptive: the choice of which analysis to run next depends on the outcome of previous analyses. Common examples include variable selection for regression problems and hyper-parameter optimization in large-scale machine learning problems: in both cases, common practice involves repeatedly evaluating a series of models on the same dataset. Unfortunately, this kind of adaptive re-use of data invalidates many traditional methods of avoiding overfitting and false discovery, and has been blamed in part for the recent flood of non-reproducible findings in the empirical sciences. An exciting line of work beginning with Dwork et al. in 2015 establishes the first formal model and first algorithmic results providing a general approach to mitigating the harms of adaptivity, via a connection to the notion of differential privacy. In this talk, we’ll explore the notion of differential privacy and gain some understanding of how and why it provides protection against adaptivity-driven overfitting. Many interesting questions in this space remain open. Joint work with: Christopher Jung (UPenn), Seth Neel (Harvard), Aaron Roth (UPenn), Saeed Sharifi-Malvajerdi (UPenn), and Moshe Shenfeld (HUJI). This talk will draw on work that appeared at NeurIPS 2019 and ITCS 2020
10:50AM – 11:35AM	Hima Lakkaraju, Harvard University	Title: Towards Reliable and Robust Model Explanations Abstract: As machine learning black boxes are increasingly being deployed in domains such as healthcare and criminal justice, there is growing emphasis on building tools and techniques for explaining these black boxes in an interpretable manner. Such explanations are being leveraged by domain experts to diagnose systematic errors and underlying biases of black boxes. In this talk, I will present some of our recent research that sheds light on the vulnerabilities of popular post hoc explanation techniques such as LIME and SHAP, and also introduce novel methods to address some of these vulnerabilities. More specifically, I will first demonstrate that these methods are brittle, unstable, and are vulnerable to a variety of adversarial attacks. Then, I will discuss two solutions to address some of the vulnerabilities of these methods – (i) a framework based on adversarial training that is designed to make post hoc explanations more stable and robust to shifts in the underlying data; (ii) a Bayesian framework that captures the uncertainty associated with post hoc explanations and in turn allows us to generate explanations with user specified levels of confidences. I will conclude the talk by discussing results from real world datasets to both demonstrate the vulnerabilities in post hoc explanation techniques as well as the efficacy of our aforementioned solutions.
11:40AM – 12:25PM	Moran Koren, Harvard CMSA	Title: A Gatekeeper’s Conundrum Abstract: Many selection processes contain a “gatekeeper”. The gatekeeper’s goal is to examine an applicant’s suitability to a proposed position before both parties endure substantial costs. Intuitively, the introduction of a gatekeeper should reduce selection costs as unlikely applicants are sifted out. However, we show that this is not always the case as the gatekeeper’s introduction inadvertently reduces the applicant’s expected costs and thus interferes with her self-selection. We study the conditions under which the gatekeeper’s presence improves the system’s efficiency and those conditions under which the gatekeeper’s presence induces inefficiency. Additionally, we show that the gatekeeper can sometimes improve selection correctness by behaving strategically (i.e., ignore her private information with some probability).
12:25PM	Conference Organizers	Closing Remarks

Details

Date:: August 24, 2021
Event Categories:: Big Data Conference, Event

Organizer

: Horng-Tzer Yau (Harvard CMSA)
: Scott Duke Kominers
: Shing-Tung Yau

Venue

Virtual