Symposium on Foundations of Responsible Computing (FORC)

06/06/2022 9:00 am - 06/08/2022 5:00 pm
CMSA Room G10
Address: CMSA, 20 Garden Street, Cambridge, MA 02138 USA

On June 6-8, 2022, the CMSA hosted the 3rd annual Symposium on Foundations of Responsible Computing (FORC).

The Symposium on Foundations of Responsible Computing (FORC) is a forum for mathematical research in computation and society writ large.  The Symposium aims to catalyze the formation of a community supportive of the application of theoretical computer science, statistics, economics and other relevant analytical fields to problems of pressing and anticipated societal concern.

Organizers: Cynthia Dwork, Harvard SEAS | Omer Reingold, Stanford | Elisa Celis, Yale

Schedule

DOWNLOAD PDF

June 6, 2022

9:15 am–10:15 am Opening Remarks

Keynote Speaker: Caroline Nobo, Yale University

Title: From Theory to Impact: Why Better Data Systems are Necessary for Criminal Legal Reform

Abstract: This talk will dive into the messy, archaic, and siloed world of local criminal justice data in America. We will start with a 30,000 foot discussion about the current state of criminal legal data systems, then transition to the challenges of this broken paradigm, and conclude with a call to measure new things – and to measure them better! This talk will leave you with an understanding of criminal justice data infrastructure and transparency in the US, and will discuss how expensive case management software and other technology are built on outdated normative values which impede efforts to reform the system. The result is an infuriating paradox: an abundance of tech products built without theoretical grounding, in a space rich with research and evidence.

10:15 am–10:45 am Coffee Break
10:45 am–12:15 pm Paper Session 1 Session Chair: Ruth Urner
Georgy Noarov, University of Pennsylvania Title: Online Minimax Multiobjective Optimization

Abstract: We introduce a simple but general online learning framework in which a learner plays against an adversary in a vector-valued game that changes every round. The learner’s objective is to minimize the maximum cumulative loss over all coordinates. We give a simple algorithm that lets the learner do almost as well as if she knew the adversary’s actions in advance. We demonstrate the power of our framework by using it to (re)derive optimal bounds and efficient algorithms across a variety of domains, ranging from multicalibration to a large set of no-regret algorithms, to a variant of Blackwell’s approachability theorem for polytopes with fast convergence rates. As a new application, we show how to “(multi)calibeat” an arbitrary collection of forecasters — achieving an exponentially improved dependence on the number of models we are competing against, compared to prior work.

Matthew Eichhorn, Cornell University Title: Mind your Ps and Qs: Allocation with Priorities and Quotas

Abstract: In many settings, such as university admissions, the rationing of medical supplies, and the assignment of public housing, decision-makers use normative criteria (ethical, financial, legal, etc.) to justify who gets an allocation. These criteria can often be translated into quotas for the number of units available to particular demographics and priorities over agents who qualify in each demographic. Each agent may qualify in multiple categories at different priority levels, so many allocations may conform to a given set of quotas and priorities. Which of these allocations should be chosen? In this talk, I’ll formalize this reserve allocation problem and motivate Pareto efficiency as a natural desideratum. I’ll present an algorithm to locate efficient allocations that conform to the quota and priority constraints. This algorithm relies on beautiful techniques from integer and linear programming, and it is both faster and more straightforward than existing techniques in this space. Moreover, its clean formulation allows for further refinement, such as the secondary optimization of some heuristics for fairness.

Haewon Jeong, Harvard University Title: Fairness without Imputation: A Decision Tree Approach for Fair Prediction with Missing Values

Abstract: We investigate the fairness concerns of training a machine learning model using data with missing values. Even though there are a number of fairness intervention methods in the literature, most of them require a complete training set as input. In practice, data can have missing values, and data missing patterns can depend on group attributes (e.g. gender or race). Simply applying off-the-shelf fair learning algorithms to an imputed dataset may lead to an unfair model. In this paper, we first theoretically analyze different sources of discrimination risks when training with an imputed dataset. Then, we propose an integrated approach based on decision trees that does not require a separate process of imputation and learning. Instead, we train a tree with missing incorporated as attribute (MIA), which does not require explicit imputation, and we optimize a fairness-regularized objective function. We demonstrate that our approach outperforms existing fairness intervention methods applied to an imputed dataset, through several experiments on real-world datasets.

Emily Diana, University of Pennsylvania Title: Multiaccurate Proxies for Downstream Fairness

Abstract: We study the problem of training a model that must obey demographic fairness conditions when the sensitive features are not available at training time — in other words, how can we train a model to be fair by race when we don’t have data about race? We adopt a fairness pipeline perspective, in which an “upstream” learner that does have access to the sensitive features will learn a proxy model for these features from the other attributes. The goal of the proxy is to allow a general “downstream” learner — with minimal assumptions on their prediction task — to be able to use the proxy to train a model that is fair with respect to the true sensitive features. We show that obeying multiaccuracy constraints with respect to the downstream model class suffices for this purpose, provide sample- and oracle efficient-algorithms and generalization bounds for learning such proxies, and conduct an experimental evaluation. In general, multiaccuracy is much easier to satisfy than classification accuracy, and can be satisfied even when the sensitive features are hard to predict.

12:15 pm–1:45 pm Lunch Break
1:45–3:15 pm Paper Session 2 Session Chair: Guy Rothblum
Elbert Du, Harvard University Title: Improved Generalization Guarantees in Restricted Data Models

Abstract: Differential privacy is known to protect against threats to validity incurred due to adaptive, or exploratory, data analysis — even when the analyst adversarially searches for a statistical estimate that diverges from the true value of the quantity of interest on the underlying population. The cost of this protection is the accuracy loss incurred by differential privacy. In this work, inspired by standard models in the genomics literature, we consider data models in which individuals are represented by a sequence of attributes with the property that where distant attributes are only weakly correlated. We show that, under this assumption, it is possible to “re-use” privacy budget on different portions of the data, significantly improving accuracy without increasing the risk of overfitting.

Ruth Urner, York University Title: Robustness Should not be at Odds with Accuracy

Abstract: The phenomenon of adversarial examples in deep learning models has caused substantial concern over their reliability and trustworthiness: in many instances an imperceptible perturbation can falsely flip a neural network’s prediction. Applied research in this area has mostly focused on developing novel adversarial attack strategies or building better defenses against such. It has repeatedly been pointed out that adversarial robustness may be in conflict with requirements for high accuracy. In this work, we take a more principled look at modeling the phenomenon of adversarial examples. We argue that deciding whether a model’s label change under a small perturbation is justified, should be done in compliance with the underlying data-generating process. Through a series of formal constructions, systematically analyzing the the relation between standard Bayes classifiers and robust-Bayes classifiers, we make the case for adversarial robustness as a locally adaptive measure. We propose a novel way defining such a locally adaptive robust loss, show that it has a natural empirical counterpart, and develop resulting algorithmic guidance in form of data-informed adaptive robustness radius. We prove that our adaptive robust data-augmentation maintains consistency of 1-nearest neighbor classification under deterministic labels and thereby argue that robustness should not be at odds with accuracy.

Sushant Agarwal, University of Waterloo Title: Towards the Unification and Robustness of Perturbation and Gradient Based Explanations

Abstract: As machine learning black boxes are increasingly being deployed in critical domains such as healthcare and criminal justice, there has been a growing emphasis on developing techniques for explaining these black boxes in a post hoc manner. In this work, we analyze two popular post hoc interpretation techniques: SmoothGrad which is a gradient based method, and a variant of LIME which is a perturbation based method. More specifically, we derive explicit closed form expressions for the explanations output by these two methods and show that they both converge to the same explanation in expectation, i.e., when the number of perturbed samples used by these methods is large. We then leverage this connection to establish other desirable properties, such as robustness and linearity, for these techniques. We also derive finite sample complexity bounds for the number of perturbations required for these methods to converge to their expected explanation. Finally, we empirically validate our theory using extensive experimentation on both synthetic and real world datasets.

Tijana Zrnic, University of California, Berkeley Title: Regret Minimization with Performative Feedback

Abstract: In performative prediction, the deployment of a predictive model triggers a shift in the data distribution. As these shifts are typically unknown ahead of time, the learner needs to deploy a model to get feedback about the distribution it induces. We study the problem of finding near-optimal models under performativity while maintaining low regret. On the surface, this problem might seem equivalent to a bandit problem. However, it exhibits a fundamentally richer feedback structure that we refer to as performative feedback: after every deployment, the learner receives samples from the shifted distribution rather than only bandit feedback about the reward. Our main contribution is regret bounds that scale only with the complexity of the distribution shifts and not that of the reward function. The key algorithmic idea is careful exploration of the distribution shifts that informs a novel construction of confidence bounds on the risk of unexplored models. The construction only relies on smoothness of the shifts and does not assume convexity. More broadly, our work establishes a conceptual approach for leveraging tools from the bandits literature for the purpose of regret minimization with performative feedback.

3:15 pm–3:45 pm Coffee Break
3:45 pm–5:00 pm Panel Discussion Title: What is Responsible Computing?

Panelists: Jiahao Chen, Cynthia Dwork, Kobbi Nissim, Ruth Urner

Moderator: Elisa Celis

 

June 7, 2022

9:15 am–10:15 am Keynote Speaker: Isaac Kohane, Harvard Medical School Title: What’s in a label? The case for and against monolithic group/ethnic/race labeling for machine learning

Abstract: Populations and group labels have been used and abused for thousands of years. The scale at which AI can incorporate such labels into its models and the ways in which such models can be misused are cause for significant concern. I will describe, with examples drawn from experiments in precision medicine, the task dependence of how underserved and oppressed populations can be both harmed and helped by the use of group labels. The source of the labels and the utility models underlying their use will be particularly emphasized.

10:15 am–10:45 am Coffee Break
10:45 am–12:15 pm Paper Session 3 Session Chair: Ruth Urner
Rojin Rezvan, University of Texas at Austin Title: Individually-Fair Auctions for Multi-Slot Sponsored Search

Abstract: We design fair-sponsored search auctions that achieve a near-optimal tradeoff between fairness and quality. Our work builds upon the model and auction design of Chawla and Jagadeesan, who considered the special case of a single slot. We consider sponsored search settings with multiple slots and the standard model of click-through rates that are multiplicatively separable into an advertiser-specific component and a slot-specific component. When similar users have similar advertiser-specific click-through rates, our auctions achieve the same near-optimal tradeoff between fairness and quality. When similar users can have different advertiser-specific preferences, we show that a preference-based fairness guarantee holds. Finally, we provide a computationally efficient algorithm for computing payments for our auctions as well as those in previous work, resolving another open direction from Chawla and Jagadeesan.

Judy Hanwen Shen, Stanford Title: Leximax Approximations and Representative Cohort Selection

Abstract: Finding a representative cohort from a broad pool of candidates is a goal that arises in many contexts such as choosing governing committees and consumer panels. While there are many ways to define the degree to which a cohort represents a population, a very appealing solution concept is lexicographic maximality (leximax) which offers a natural (pareto-optimal like) interpretation that the utility of no population can be increased without decreasing the utility of a population that is already worse off. However, finding a leximax solution can be highly dependent on small variations in the utility of certain groups. In this work, we explore new notions of approximate leximax solutions with three distinct motivations: better algorithmic efficiency, exploiting significant utility improvements, and robustness to noise. Among other definitional contributions, we give a new notion of an approximate leximax that satisfies a similarly appealing semantic interpretation and relate it to algorithmically-feasible approximate leximax notions. When group utilities are linear over cohort candidates, we give an efficient polynomial-time algorithm for finding a leximax distribution over cohort candidates in the exact as well as in the approximate setting. Furthermore, we show that finding an integer solution to leximax cohort selection with linear utilities is NP-Hard.

Jiayuan Ye,
National University of Singapore
Title: Differentially Private Learning Needs Hidden State (or Much Faster Convergence)

Abstract: Differential privacy analysis of randomized learning algorithms typically relies on composition theorems, where the implicit assumption is that the internal state of the iterative algorithm is revealed to the adversary. However, by assuming hidden states for DP algorithms (when only the last-iterate is observable), recent works prove a converging privacy bound for noisy gradient descent (on strongly convex smooth loss function) that is significantly smaller than composition bounds after a few epochs. In this talk, we extend this hidden-state analysis to various stochastic minibatch gradient descent schemes (such as under “shuffle and partition” and “sample without replacement”), by deriving novel bounds for the privacy amplification by random post-processing and subsampling. We prove that, in these settings, our privacy bound is much smaller than composition for training with a large number of iterations (which is the case for learning from high-dimensional data). Our converging privacy analysis, thus, shows that differentially private learning, with a tight bound, needs hidden state privacy analysis or a fast convergence. To complement our theoretical results, we present experiments for training classification models on MNIST, FMNIST and CIFAR-10 datasets, and observe a better accuracy given fixed privacy budgets, under the hidden-state analysis.

Mahbod Majid, University of Waterloo Title: Efficient Mean Estimation with Pure Differential Privacy via a Sum-of-Squares Exponential Mechanism

Abstract: We give the first polynomial-time algorithm to estimate the mean of a d-variate probability distribution from O(d) independent samples (up to logarithmic factors) subject to pure differential privacy.

Our main technique is a new approach to use the powerful Sum of Squares method (SoS) to design differentially private algorithms. SoS proofs to algorithms is a key theme in numerous recent works in high-dimensional algorithmic statistics – estimators which apparently require exponential running time but whose analysis can be captured by low-degree Sum of Squares proofs can be automatically turned into polynomial-time algorithms with the same provable guarantees. We demonstrate a similar proofs to private algorithms phenomenon: instances of the workhorse exponential mechanism which apparently require exponential time but which can be analyzed with low-degree SoS proofs can be automatically turned into polynomial-time differentially private algorithms. We prove a meta-theorem capturing this phenomenon, which we expect to be of broad use in private algorithm design.

12:15 pm–1:45 pm Lunch Break
1:45–3:15 pm Paper Session 4 Session Chair: Kunal Talwar
Kunal Talwar,
Apple
Title: Differential Secrecy for Distributed Data and Applications to Robust Differentially Secure Vector Summation

Abstract: Computing the noisy sum of real-valued vectors is an important primitive in differentially private learning and statistics. In private federated learning applications, these vectors are held by client devices, leading to a distributed summation problem. Standard Secure Multiparty Computation (SMC) protocols for this problem are susceptible to poisoning attacks, where a client may have a large influence on the sum, without being detected.
In this work, we propose a poisoning-robust private summation protocol in the multiple-server setting, recently studied in PRIO. We present a protocol for vector summation that verifies that the Euclidean norm of each contribution is approximately bounded. We show that by relaxing the security constraint in SMC to a differential privacy like guarantee, one can improve over PRIO in terms of communication requirements as well as the client-side computation. Unlike SMC algorithms that inevitably cast integers to elements of a large finite field, our algorithms work over integers/reals, which may allow for additional efficiencies.

Giuseppe Vietri, University of Minnesota Title: Improved Regret for Differentially Private Exploration in Linear MDP

Abstract: We study privacy-preserving exploration in sequential decision-making for environments that rely on sensitive data such as medical records. In particular, we focus on solving the problem of reinforcement learning (RL) subject to the constraint of (joint) differential privacy in the linear MDP setting, where both dynamics and rewards are given by linear functions. Prior work on this problem due to Luyo et al. (2021) achieves a regret rate that has a dependence of O(K^{3/5}) on the number of episodes K. We provide a private algorithm with an improved regret rate with an optimal dependence of O(K^{1/2}) on the number of episodes. The key recipe for our stronger regret guarantee is the adaptivity in the policy update schedule, in which an update only occurs when sufficient changes in the data are detected. As a result, our algorithm benefits from low switching cost and only performs O(log(K)) updates, which greatly reduces the amount of privacy noise. Finally, in the most prevalent privacy regimes where the privacy parameter ? is a constant, our algorithm incurs negligible privacy cost — in comparison with the existing non-private regret bounds, the additional regret due to privacy appears in lower-order terms.

Mingxun Zhou,
Carnegie Mellon University
Title: The Power of the Differentially Oblivious Shuffle in Distributed Privacy MechanismsAbstract: The shuffle model has been extensively investigated in the distributed differential privacy (DP) literature. For a class of useful computational tasks, the shuffle model allows us to achieve privacy-utility tradeoff similar to those in the central model, while shifting the trust from a central data curator to a “trusted shuffle” which can be implemented through either trusted hardware or cryptography. Very recently, several works explored cryptographic instantiations of a new type of shuffle with relaxed security, called differentially oblivious (DO) shuffles. These works demonstrate that by relaxing the shuffler’s security from simulation-style secrecy to differential privacy, we can achieve asymptotical efficiency improvements. A natural question arises, can we replace the shuffler in distributed DP mechanisms with a DO-shuffle while retaining a similar privacy-utility tradeoff?
In this paper, we prove an optimal privacy amplification theorem by composing any locally differentially private (LDP) mechanism with a DO-shuffler, achieving parameters that tightly match the shuffle model. Moreover, we explore multi-message protocols in the DO-shuffle model, and construct mechanisms for the real summation and histograph problems. Our error bounds approximate the best known results in the multi-message shuffle-model up to sub-logarithmic factors. Our results also suggest that just like in the shuffle model, allowing each client to send multiple messages is fundamentally more powerful than restricting to a single message.
Badih Ghazi,
Google Research
Title: Differentially Private Ad Conversion Measurement

Abstract: In this work, we study conversion measurement, a central functionality in the digital advertising space, where an advertiser seeks to estimate advertiser site conversions attributed to ad impressions that users have interacted with on various publisher sites. We consider differential privacy (DP), a notion that has gained in popularity due to its strong and rigorous guarantees, and suggest a formal framework for DP conversion measurement, uncovering a subtle interplay between attribution and privacy. We define the notion of an operationally valid configuration of the attribution logic, DP adjacency relation, privacy
budget scope and enforcement point, and provide, for a natural space of configurations, a complete characterization.

3:15 pm–3:45 pm Coffee Break
3:45 pm–5:00 pm Open Poster Session

 

June 8, 2022

9:15 am–10:15 am Keynote Speaker: Nuria Oliver, Data-Pop Alliance Title: Data Science against COVID-19

Abstract: In my talk, I will describe the work that I have been doing since March 2020, leading a multi-disciplinary team of 20+ volunteer scientists working very closely with the Presidency of the Valencian Government in Spain on 4 large areas: (1) human mobility modeling; (2) computational epidemiological models (both metapopulation, individual and LSTM-based models); (3) predictive models; and (4) citizen surveys via the COVID19impactsurvey with over 600,000 answers worldwide.

I will describe the results that we have produced in each of these areas, including winning the 500K XPRIZE Pandemic Response Challenge and best paper award at ECML-PKDD 2021. I will share the lessons learned in this very special initiative of collaboration between the civil society at large (through the survey), the scientific community (through the Expert Group) and a public administration (through the Commissioner at the Presidency level). WIRED magazine just published an article describing our story.

10:15 am–10:45 am Coffee Break
10:45 am–12:15 pm Paper Session 5 Session Chair: Kunal Talwar
Shengyuan Hu, Carnegie Mellon University Title: Private Multi-Task Learning: Formulation and Applications to Federated Learning

Abstract: Many problems in machine learning rely on multi-task learning (MTL), in which the goal is to solve multiple related machine learning tasks simultaneously. MTL is particularly relevant for privacy-sensitive applications in areas such as healthcare, finance, and IoT computing, where sensitive data from multiple, varied sources are shared for the purpose of learning. In this work, we formalize notions of task-level privacy for MTL via joint differential privacy (JDP), a relaxation of differential privacy for mechanism design and distributed optimization. We then propose an algorithm for mean-regularized MTL, an objective commonly used for applications in personalized federated learning, subject to JDP. We analyze our objective and solver, providing certifiable guarantees on both privacy and utility. Empirically, our method allows for improved privacy/utility trade-offs relative to global baselines across common federated learning benchmarks

Christina Yu,
Cornell University
Title: Sequential Fair Allocation: Achieving the Optimal Envy-Efficiency Tradeoff Curve

Abstract: We consider the problem of dividing limited resources to individuals arriving over T rounds with a goal of achieving fairness across individuals. In general there may be multiple resources and multiple types of individuals with different utilities. A standard definition of `fairness’ requires an allocation to simultaneously satisfy envy-freeness and Pareto efficiency. However, in the online sequential setting, the social planner must decide on a current allocation before the downstream demand is realized, such that no policy can guarantee these desiderata simultaneously with probability 1, requiring a modified metric of measuring fairness for online policies. We show that in the online setting, the two desired properties (envy-freeness and efficiency) are in direct contention, in that any algorithm achieving additive counterfactual envy-freeness up to L_T necessarily suffers an efficiency loss of at least 1 / L_T. We complement this uncertainty principle with a simple algorithm, HopeGuardrail, which allocates resources based on an adaptive threshold policy and is able to achieve any fairness-efficiency point on this frontier. Our result is the first to provide guarantees for fair online resource allocation with high probability for multiple resource and multiple type settings. In simulation results, our algorithm provides allocations close to the optimal fair solution in hindsight, motivating its use in practical applications as the algorithm is able to adapt to any desired fairness efficiency trade-off.

Hedyeh Beyhaghi, Carnegie Mellon University Title: On classification of strategic agents who can both game and improve

Abstract: In this work, we consider classification of agents who can both game and improve. For example, people wishing to get a loan may be able to take some actions that increase their perceived credit-worthiness and others that also increase their true credit-worthiness. A decision-maker would like to define a classification rule with few false-positives (does not give out many bad loans) while yielding many true positives (giving out many good loans), which includes encouraging agents to improve to become true positives if possible. We consider two models for this problem, a general discrete model and a linear model, and prove algorithmic, learning, and hardness results for each. For the general discrete model, we give an efficient algorithm for the problem of maximizing the number of true positives subject to no false positives, and show how to extend this to a partial-information learning setting. We also show hardness for the problem of maximizing the number of true positives subject to a nonzero bound on the number of false positives, and that this hardness holds even for a finite-point version of our linear model. We also show that maximizing the number of true positives subject to no false positive is NP-hard in our full linear model. We additionally provide an algorithm that determines whether there exists a linear classifier that classifies all agents accurately and causes all improvable agents to become qualified, and give additional results for low-dimensional data.

Keegan Harris, Carnegie Mellon University Title: Bayesian Persuasion for Algorithmic Recourse

Abstract: When subjected to automated decision-making, decision subjects may strategically modify their observable features in ways they believe will maximize their chances of receiving a favorable decision. In many practical situations, the underlying assessment rule is deliberately kept secret to avoid gaming and maintain competitive advantage. The resulting opacity forces the decision subjects to rely on incomplete information when making strategic feature modifications. We capture such settings as a game of Bayesian persuasion, in which the decision maker offers a form of recourse to the decision subject by providing them with an action recommendation (or signal) to incentivize them to modify their features in desirable ways. We show that when using persuasion, both the decision maker and decision subject are never worse off in expectation, while the decision maker can be significantly better off. While the decision maker’s problem of finding the optimal Bayesian incentive-compatible (BIC) signaling policy takes the form of optimization over infinitely-many variables, we show that this optimization can be cast as a linear program over finitely-many regions of the space of possible assessment rules. While this reformulation simplifies the problem dramatically, solving the linear program requires reasoning about exponentially-many variables, even under relatively simple settings. Motivated by this observation, we provide a polynomial-time approximation scheme that recovers a near-optimal signaling policy. Finally, our numerical simulations on semi-synthetic data empirically illustrate the benefits of using persuasion in the algorithmic recourse setting.

12:15 pm–1:45 pm Lunch Break
1:45–3:15 pm Paper Session 6 Session Chair: Elisa Celis
Mark Bun, Boston University Title: Controlling Privacy Loss in Sampling Schemes: An Analysis of Stratified and Cluster Sampling

Abstract: Sampling schemes are fundamental tools in statistics, survey design, and algorithm design. A fundamental result in differential privacy is that a differentially private mechanism run on a simple random sample of a population provides stronger privacy guarantees than the same algorithm run on the entire population. However, in practice, sampling designs are often more complex than the simple, data-independent sampling schemes that are addressed in prior work. In this work, we extend the study of privacy amplification results to more complex, data-dependent sampling schemes. We find that not only do these sampling schemes often fail to amplify privacy, they can actually result in privacy degradation. We analyze the privacy implications of the pervasive cluster sampling and stratified sampling paradigms, as well as provide some insight into the study of more general sampling designs.

Samson Zhou, Carnegie Mellon University Title: Private Data Stream Analysis for Universal Symmetric Norm Estimation

Abstract: We study how to release summary statistics on a data stream subject to the constraint of differential privacy. In particular, we focus on releasing the family of symmetric norms, which are invariant under sign-flips and coordinate-wise permutations on an input data stream and include L_p norms, k-support norms, top-k norms, and the box norm as special cases. Although it may be possible to design and analyze a separate mechanism for each symmetric norm, we propose a general parametrizable framework that differentially privately releases a number of sufficient statistics from which the approximation of all symmetric norms can be simultaneously computed. Our framework partitions the coordinates of the underlying frequency vector into different levels based on their magnitude and releases approximate frequencies for the “heavy” coordinates in important levels and releases approximate level sizes for the “light” coordinates in important levels. Surprisingly, our mechanism allows for the release of an arbitrary number of symmetric norm approximations without any overhead or additional loss in privacy. Moreover, our mechanism permits (1+\alpha)-approximation to each of the symmetric norms and can be implemented using sublinear space in the streaming model for many regimes of the accuracy and privacy parameters.

Aloni Cohen, University of Chicago Title: Attacks on Deidentification’s Defenses

Abstract: Quasi-identifier-based deidentification techniques (QI-deidentification) are widely used in practice, including k-anonymity, ?-diversity, and t-closeness. We present three new attacks on QI-deidentification: two theoretical attacks and one practical attack on a real dataset. In contrast to prior work, our theoretical attacks work even if every attribute is a quasi-identifier. Hence, they apply to k-anonymity, ?-diversity, t-closeness, and most other QI-deidentification techniques.
First, we introduce a new class of privacy attacks called downcoding attacks, and prove that every QI-deidentification scheme is vulnerable to downcoding attacks if it is minimal and hierarchical. Second, we convert the downcoding attacks into powerful predicate singling-out (PSO) attacks, which were recently proposed as a way to demonstrate that a privacy mechanism fails to legally anonymize under Europe’s General Data Protection Regulation. Third, we use LinkedIn.com to reidentify 3 students in a k-anonymized dataset published by EdX (and show thousands are potentially vulnerable), undermining EdX’s claimed compliance with the Family Educational Rights and Privacy Act.

The significance of this work is both scientific and political. Our theoretical attacks demonstrate that QI-deidentification may offer no protection even if every attribute is treated as a quasi-identifier. Our practical attack demonstrates that even deidentification experts acting in accordance with strict privacy regulations fail to prevent real-world reidentification. Together, they rebut a foundational tenet of QI-deidentification and challenge the actual arguments made to justify the continued use of k-anonymity and other QI-deidentification techniques.

Steven Wu,
Carnegie Mellon University
Title: Fully Adaptive Composition in Differential Privacy

Abstract: Composition is a key feature of differential privacy. Well-known advanced composition theorems allow one to query a private database quadratically more times than basic privacy composition would permit. However, these results require that the privacy parameters of all algorithms be fixed before interacting with the data. To address this, Rogers et al. introduced fully adaptive composition, wherein both algorithms and their privacy parameters can be selected adaptively. The authors introduce two probabilistic objects to measure privacy in adaptive composition: privacy filters, which provide differential privacy guarantees for composed interactions, and privacy odometers, time-uniform bounds on privacy loss. There are substantial gaps between advanced composition and existing filters and odometers. First, existing filters place stronger assumptions on the algorithms being composed. Second, these odometers and filters suffer from large constants, making them impractical. We construct filters that match the tightness of advanced composition, including constants, despite allowing for adaptively chosen privacy parameters. We also construct several general families of odometers. These odometers can match the tightness of advanced composition at an arbitrary, preselected point in time, or at all points in time simultaneously, up to a doubly-logarithmic factor. We obtain our results by leveraging recent advances in time-uniform martingale concentration. In sum, we show that fully adaptive privacy is obtainable at almost no loss, and conjecture that our results are essentially not improvable (even in constants) in general.

3:15 pm–3:45 pm FORC Reception
3:45 pm–5:00 pm Social Hour