Computer Science for Mathematicians

The CMSA will host a weekly seminar running through 2020-21, where researchers from all areas of computer science present modern frontiers in computer science to the general mathematical audience. Talks shall be accessible to anyone with a good mathematical background; no domain-specific background is required. The seminar will run weekly on Tuesday 11:30 am ET.

In order to attend this seminar, please fill out this form.

For more information, contact the seminar organizer, Omri Ben-Eliezer (omribene@cmsa.fas.harvard.edu).

Spring 2021:

Date SpeakerTitle/Abstract
2/2/2021Shweta Jain (UIUC)

Video
TitleCounting cliques in real-world graphs

Abstract: Cliques are important structures in network science that have been used in numerous applications including spam detection, graph analysis, graph modeling, community detection among others. Obtaining the counts of k-cliques in real-world 
graphs with millions of nodes and edges is a challenging problem due to combinatorial explosion. Essentially, as k increases, the number of k-cliques goes up exponentially. Existing techniques are (typically) able to count k-cliques for only up to k=5.

In this talk, I will present two algorithms for obtaining k-clique counts that improve the state of the art, both in theory and in practice. The first method is a randomized algorithm that gives a (1+ε)-approximation for the number of k-cliques in a graph. Its running time is proportional to the size of an object called the Turán Shadow, which, for real-world graphs is found to be small. In practice, this algorithm works well for k<=10 and gives orders of magnitude improvement over existing methods. This paper won the Best Paper Award at WWW, 2017.

The second method, a somewhat surprising result, is a simple but powerful algorithm called Pivoter that gives the exact k-clique counts for all k and runs in O(n3^{n/3}) time in the worst case. It uses a classic technique called pivoting that drastically cuts down the search space for cliques and builds a structure called the Succinct Clique Tree from which global and local (per-vertex and per-edge) k-clique counts can be easily read off. In practice, the algorithm is orders of magnitude faster than even other parallel algorithms and makes clique counting feasible for a number of graphs for which clique counting was infeasible before. This paper won the Best Paper Award at WSDM, 2020.

Joint work with C. Seshadhri.
2/9/2021
Max Hopkins (UC San Diego)

Video
TitlePoint Location and Active Learning – Learning Halfspaces Almost Optimally

Abstract: The point location problem is a fundamental question in computational geometry. It asks, given a known partition of R^d into cells by n hyperplanes and an unknown input point x, which cell contains x? More formally, given access to x via linear queries (i.e. for any hyperplane h in R^d we may ask: “is x above or below h?”), our goal is to locate the cell containing x in as few queries as possible.

In its dual formulation, point location is equivalent to a well-studied problem in machine learning: active learning halfspaces. In this version of the problem we are given a known set of data points X in R^d, and an unknown hyperplane h. The goal is to label all points in X by asking as few linear queries as possible, where in this formulation a linear query simply corresponds to asking for the label of some point in R^d.

It has long been known that solving the point location problem on n hyperplanes in d-dimensions requires at least Omega(dlog(n)) queries. Over the past 40 years, a series of works in the computational geometry literature lowered the corresponding upper bound to O(d^2log(d)log(n)). In this talk, we show how taking a learning theoretic approach to the problem allows us to reach near-linear dependence on d.  The proof combines old margin-based learning and vector-scaling techniques with a novel structural decomposition used for dimensionality reduction.

Based on joint work with Daniel Kane, Shachar Lovett, and Gaurav Mahajan.
2/16/2021

1:30pm ET
Michael P. Kim (UC Berkeley)

Video
TitleOutcome Indistinguishability

Abstract: Prediction algorithms assign numbers to individuals that are popularly understood as individual “probabilities” — e.g., what is the probability of 5-year survival after cancer diagnosis? — and which increasingly form the basis for life-altering decisions. The understanding of individual probabilities in the context of such unrepeatable events has been the focus of intense study for decades within probability theory, statistics, and philosophy. Building off of notions developed in complexity theory and cryptography, we introduce and study Outcome Indistinguishability (OI). OI predictors yield a model of probabilities that cannot be efficiently refuted on the basis of the real-life observations produced by Nature.

We investigate a hierarchy of OI definitions, whose stringency increases with the degree to which distinguishers may access the predictor in question.  Our findings reveal that OI behaves qualitatively differently than previously studied notions of indistinguishability.  First, we provide constructions at all levels of the hierarchy.  Then, leveraging recently-developed machinery for proving average-case fine-grained hardness, we obtain lower bounds on the complexity of the more stringent forms of OI.  The hardness result provides scientific grounds for the political argument that, when inspecting algorithmic risk prediction instruments, auditors should be granted oracle access to the algorithm, not simply historical predictions.


Joint work with Cynthia Dwork, Omer Reingold, Guy N. Rothblum, Gal Yona; to appear at STOC 2021.
2/23/2021Shyam Narayanan (MIT)

Video
TitleLearning-Based Support Size Estimation in Sublinear Time

Abstract: We consider the problem of estimating the number of distinct elements in a large data set from a random sample of its elements. The problem occurs in many applications, including biology, genomics, computer systems and linguistics. A line of research spanning the last decade resulted in algorithms that estimate the support up to $\pm \epsilon n$ from a sample of size $O(\log^2(1/\epsilon) \cdot n/\log n)$, where $n$ is the data set size. Unfortunately, this bound is known to be tight, limiting further improvements to the complexity of this problem. In this paper we consider estimation algorithms augmented with a machine-learning-based predictor that, given any element, returns an estimation of  its frequency. We show that if the predictor is correct up to a constant approximation factor, then the sample complexity can be reduced significantly, to $\log (1/\eps) \cdot n^{1-\Theta(1/\log(1/\eps))}.$ We evaluate the proposed algorithms on a collection of data sets, using the neural-network based estimators from {Hsu et al, ICLR’19} as predictors. Our experiments demonstrate substantial (up to 3x) improvements in the estimation accuracy compared to the state of the art algorithm.
3/2/2021Sandeep Silwal (MIT)TitleRandomized Dimensionality Reduction for Clustering

Abstract: Random dimensionality reduction is a versatile tool for speeding up algorithms for high-dimensional problems. We study its application to two clustering problems: the facility location problem, and the single-link hierarchical clustering problem, which is equivalent to computing the minimum spanning tree. We show that if we project the input pointset $X$ onto a random $d = O(d_X)$-dimensional subspace (where $d_X$ is the doubling dimension of $X$), then the optimum facility location cost in the projected space approximates the original cost up to a constant factor. We show an analogous statement for minimum spanning tree, but with the dimension $d$ having an extra $\log \log n$ term and the approximation factor being arbitrarily close to $1$. Furthermore, we extend these results to approximating solutions instead of just their costs. Lastly, we provide experimental results to validate the quality of solutions and the speedup due to the dimensionality reduction.

Unlike several previous papers studying this approach in the context of $k$-means and $k$-medians, our dimension bound does not depend on the number of clusters but only on the intrinsic dimensionality of $X$. 

Joint work with Shyam Narayanan, Piotr Indyk, Or Zamir.

Fall 2020:

DateSpeakerTitle/Abstract
9/15/2020
Yannai A. Gonczarowski (Microsoft Research)

Video
Title: The Menu-Size of Approximately Optimal Auctions

Abstract: We consider a monopolist who wishes to sell n goods to a single additive buyer, where the buyer’s valuations for the goods are drawn according to independent distributions. It is well known that—unlike in the single item case—the revenue-optimal auction (a pricing scheme) may be complex, sometimes requiring a continuum of menu entries, that is, offering the buyer a choice between a continuum of lottery tickets. It is also known that simple auctions with a finite bounded number of menu entries (lotteries for the buyer to choose from) can extract a constant fraction of the optimal revenue, as well as that for the case of bounded distributions it is possible to extract an arbitrarily high fraction of the optimal revenue via a finite bounded menu size. Nonetheless, the question of the possibility of extracting an arbitrarily high fraction of the optimal revenue via a finite menu size, when the valuation distributions possibly have unbounded support (or via a finite bounded menu size when the support of the distributions is bounded by an unknown bound), remained open since the seminal paper of Hart and Nisan (2013), and so has the question of any lower-bound on the menu-size that suffices for extracting an arbitrarily high fraction of the optimal revenue when selling a fixed number of goods, even for two goods and even for i.i.d. bounded distributions.
In this talk, we resolve both of these questions. We first show that for every n and for every ε>0, there exists a menu-size bound C=C(n,ε) such that auctions of menu size at most C suffice for extracting a (1-ε) fraction of the optimal revenue from any valuation distributions, and give an upper bound on C(n,ε), even when the valuation distributions are unbounded and nonidentical. We then proceed to giving two lower bounds, which hold even for bounded i.i.d. distributions: one on the dependence on n when ε=1/n and n grows large, and the other on the dependence on ε when n is fixed and ε grows small. Finally, we apply these upper and lower bounds to derive results regarding the deterministic communication complexity required to run an auction that achieves such an approximation.
9/22/2020Daniel Goldberg (NextSilicon)Title: Cybersecurity research in the wild

Abstract: Cybersecurity research exhibits classic yet complex challenges, melding together cryptography, programming language design, and computational complexity, along with psychology and industrial design.
One example of these challenges is crafting an expressive yet safe programming language. SQL — the most popular database querying language — is, however, far from being safe; its expressiveness and lack of care in design result in countless SQL injection attacks to this day. The approaches to mitigating this design flaw vary between academia and industry and involve a mixture of graph analysis, system engineering and new designs of programming interfaces.
During this talk I will review the different participants in frontier research: industry, academia, nationstates and hobbyists. The first part of the talk will focus on these participants and their incentives, while the second part will contrast how academia is approaching them compared to industry and nationstates.
9/29/2020Rajesh Jayaram (CMU)

Video
Title: Testing Positive Semi-Definiteness via Random Submatrices

Abstract: Given an n x n matrix A with bounded entries, we study the problem of testing whether A is positive semi-definite (PSD) via only a small number of queries to the entries of A. While in general one must read the entire matrix A to determine if it is PSD, we demonstrate that testing whether A is PSD or “far” from PSD (under some norm) is indeed possible with query complexity which only depends on the distance to the PSD cone. This setup is commonly referred to as the property testing framework. We consider two natural norms of n x n matrices: the spectral norm and the Frobenius (Euclidean) norm. We give a query-optimal algorithm for the former, and present new upper and lower bounds for the latter.
Both of these algorithms have a very simple structure: they randomly sample a collection of principal submatrices and check whether these submatrices are PSD. Thus, our problem can phrased purely as a question in random matrix theory: given a (entry-wise bounded) matrix A which is at distance D (in some norm) from the PSD cone, what is the probability that a random k x k principal submatrix is not PSD? For the spectral norm, this problem can be tightly answered by classical arguments (e.g. scalar valued concentration), however the case of the Euclidean norm appears to be more involved, and will require matrix concentration based arguments.
In this talk, we will discuss the analysis of eigenvalues of random submatrices which lead to these algorithms, and touch on several open questions related to spectral concentration of random submatrices.
Joint work with Ainesh Bakshi and Nadiia Chepurko.
Talk based on the paper https://arxiv.org/abs/2005.06441, to appear in FOCS 2020.
10/6/2020Hadar Averbuch-Elor (Cornell Tech)TitleGeneration by Decomposition

Abstract: Deep learning has revolutionized our ability to generate novel images and 3D shapes. Typically neural networks are trained to map a high-dimensional latent code to full realistic samples. In this talk, I will present two recent works focusing on generation of handwritten text and 3D shapes. In these works, we take a different approach and generate image and shape samples using a more granular part-based decomposition, demonstrating that the whole is not necessarily “greater than the sum of its parts”. I will also discuss how our generation by decomposition approach allows for a semantic manipulation of 3D shapes and improved handwritten text recognition performance.
10/13/2020Cancelled
10/20/2020Rani Hod (Tel Aviv University)

Video
TitleImproved Lower Bounds for the Fourier Entropy/Influence Conjecture via Lexicographic Functions

Abstract: Every Boolean function can be uniquely represented as a multilinear polynomial. The entropy and the total influence are two ways to measure the concentration of its Fourier coefficients, namely the monomial coefficients in this representation: the entropy roughly measures their spread, while the total influence measures their average level. The Fourier Entropy/Influence conjecture of Friedgut and Kalai from 1996 states that the entropy to influence ratio is bounded by a universal constant C.

Using lexicographic Boolean functions, we present three explicit asymptotic constructions that improve upon the previously best known lower bound C > 6.278944 by O’Donnell and Tan, obtained via recursive composition. The first uses their construction with the lexicographic function 𝓁⟨2/3⟩ of measure 2/3 to demonstrate that C >= 4+3 log_4 (3) > 6.377444. The second generalizes their construction to biased functions and obtains C > 6.413846 using 𝓁⟨Φ⟩, where Φ is the inverse golden ratio. The third, independent, construction gives C > 6.454784, even for monotone functions.

Beyond modest improvements to the value of C, our constructions shed some new light on the properties sought in potential counterexamples to the conjecture.

Additionally, we prove a Lipschitz-type condition on the total influence and spectral entropy, which may be of independent interest.
10/27/2020Vaggos Chatziafratis (Google Research)

Video
Title: Depth-Width Trade-offs for Neural Networks through the lens of Dynamical Systems

Abstract: How can we use the theory of dynamical systems in analyzing the capabilities of neural networks? Understanding the representational power of Deep Neural Networks (DNNs) and how their structural properties (e.g., depth, width, type of activation unit) affect the functions they can compute, has been an important yet challenging question in deep learning and approximation theory. In a seminal paper, Telgarsky reveals the limitations of shallow neural networks by exploiting the oscillatory behavior of a simple triangle function and states as a tantalizing open question to characterize those functions that cannot be well-approximated by small depths. 

In this work, we point to a new connection between DNNs expressivity and dynamical systems, enabling us to get trade-offs for representing functions based on the presence of a generalized notion of fixed points, called periodic points that have played a major role in chaos theory (Li-Yorke chaos and Sharkovskii’s theorem). Our main results are general lower bounds for the width needed to represent periodic functions as a function of the depth, generalizing previous constructions relying on specific functions. 

Based on two recent works: 
with Ioannis Panageas, Sai Ganesh Nagarajan, Xiao Wang from ICLR’20 (spotlight):  https://arxiv.org/abs/1912.04378 
with Ioannis Panageas, Sai Ganesh Nagarajan from ICML’20: https://arxiv.org/abs/2003.00777
11/3/2020Alaa Maalouf & Ibrahim Jubran (Haifa University)

Video
TitleFast and Accurate Least-Mean-Squares Solvers

Abstract:  Least-mean squares (LMS) solvers such as Linear / Ridge / Lasso-Regression, SVD and Elastic-Net not only solve fundamental machine learning problems, but are also the building blocks in a variety of other methods, such as decision trees and matrix factorizations. 

We suggest an algorithm that gets a finite set of $n$ $d$-dimensional real vectors and returns a weighted subset of $d + 1$ vectors whose sum is exactly the same. The proof in Caratheodory’s Theorem (1907) computes such a subset in $O(n^2 d^2 )$ time
and thus not used in practice. Our algorithm computes this subset in $O(nd)$ time, using $O(logn)$ calls to Caratheodory’s construction on small but “smart” subsets. This is based on a novel paradigm of fusion between different data summarization techniques, known as sketches and coresets.

As an example application, we show how it can be used to boost the performance of existing LMS solvers, such as those in scikit-learn library, up to $x100$. Generalization for streaming and distributed (big) data is trivial. Extensive experimental results and complete open source code are also provided.
11/10/2020Keyulu Xu (MIT)

Video
Title: Graph Neural Networks: Expressive Power, Generalization, and Extrapolation

Abstract: Recent advances in deep learning exploit the structure in data and architectures. Graph Neural Network (GNN) is a powerful framework for learning with graph-structured objects, and for learning the interaction of objects on a graph. Applications include recommender systems, drug discovery, physical and visual reasoning, program synthesis, and natural language processing.

In this talk, we study GNNs from the following aspects: expressive power, generalization, and extrapolation. We characterize the expressive power of GNNs from the perspective of graph isomorphism tests. We show an upper bound that GNNs are at most as powerful as a Weisfeiler-Lehman test. We then show conditions to achieve this upper bound, and present a maximally powerful GNN. Next, we analyze the generalization of GNNs. The optimization trajectories of over-parameterized GNNs trained by gradient descent correspond to those of kernel regression using a specific graph neural tangent kernel. Using this relation, we show GNNs provably learn a class of functions on graphs. More generally, we study how the architectural inductive biases influence generalization in a task. We introduce an algorithmic alignment measure, and show better alignment implies better generalization. Our framework suggests GNNs can sample-efficiently learn dynamic programming algorithms. Finally, we study how neural networks trained by gradient descent extrapolate, i.e., what they learn outside the support of the training distribution (e.g., on larger graphs or edge weights). We prove a linear extrapolation behavior of ReLU multilayer perceptrons (MLPs), and identify conditions under which MLPs and GNNs extrapolate well. Our results suggest how a good representation or architecture can help extrapolation.

Talk based on:
https://arxiv.org/abs/1810.00826  ICLR’19 (oral)
https://arxiv.org/abs/1905.13192  NeurIPS’19
https://arxiv.org/abs/1905.13211  ICLR’20 (spotlight)
https://arxiv.org/abs/2009.11848  
11/17/2020Daniel Alabi (Harvard)

Video
Title: Differentially Private Simple Linear Regression

Abstract: Economics and social science research often require analyzing datasets of sensitive personal information at fine granularity, with models fit to small subsets of the data. Unfortunately, such fine-grained analysis can easily reveal sensitive individual information. We study algorithms for simple linear regression that satisfy differential privacy, a constraint which guarantees that an algorithm’s output reveals little about any individual input data record, even to an attacker with arbitrary side information about the dataset. We consider the design of differentially private algorithms for simple linear regression for small datasets, with tens to hundreds of datapoints, which is a particularly challenging regime for differential privacy. Focusing on a particular application to small-area analysis in economics research, we study the performance of a spectrum of algorithms we adapt to the setting. We identify key factors that affect their performance, showing through a range of experiments that algorithms based on robust estimators (in particular, the Theil-Sen estimator) perform well on the smallest datasets, but that other more standard algorithms do better as the dataset size increases. See https://arxiv.org/abs/2007.05157 for more details.

Joint work with Audra McMillan, Jayshree Sarathy, Adam Smith, and Salil Vadhan.

If time permits, I will chronicle past work on differentially private linear regression, discussing previous works on distributed linear regression and hypothesis testing in the general linear model.
11/24/2020Kiril Solovey (Stanford)

Video
TitleLarge-scale multi-robot systems: From algorithmic foundations to smart-mobility applications

Abstract: Multi-robot systems are already playing a crucial role in manufacturing, warehouse automation, and natural resource monitoring, and in the future they will be employed in even broader domains from space exploration to search-and-rescue. Moreover, these systems will likely be incorporated in our daily lives through drone delivery services and smart mobility systems that comprise of thousands of autonomous vehicles. The anticipated benefits of multi-robot systems are numerous, ranging from automating dangerous jobs, to broader societal facets such  as easing traffic congestion and sustainability. However, to reap those rewards we must develop control mechanisms for such systems that can adapt rapidly to unexpected changes on a massive scale. Importantly, these mechanisms must capture: (i) dynamical and collision-avoidance constraints of individual robots; (ii) interactions between multiple robots; and (iii) more broadly, the  interaction of those systems with the environment. All these considerations give rise to extremely complex and high-dimensional optimization problems that need to be solved in real-time.

In this talk I will present recent progress on the design of algorithms for  control and decision-making to allow the safe, effective, and societally-equitable deployment of multi-robot systems. I will highlight both results on fundamental capabilities for multi-robot systems (e.g., motion planning and task allocation), as well as applications in smart mobility, including multi-drone delivery and autonomous mobility-on-demand systems. Along the way, I will mention a few related open problems in mathematics and algorithm design.
12/1/2020Joseph Dimos (AxiomaVox)

Video
Title: Some extensions on argumentation frameworks via hypergraphs


Abstract: The Dung Abstract Argumentation Framework (AAF) is an effective formalism for modelling disputes between two or more agents. Generally, the Dung AF is extended to include some unique interactions between agents. This has further been explained with the Bipolar Argumentation Framework (BAF). In the academic space, the use of AAF is highly signified. We can use the AF as a means to resolve disagreements that allows for the determination of a winning argument. In general, there can be imperfect ontologies that affect how reasoning is defined. Typical logic-based AFs apply to the incoherent/uncertain ontologies. However, Dung demonstrated a stable extension of AF to support an “acceptable standard of behavior”. This talk will align with present endeavors on extending the Dung AAF to consider the notion of conflict-freeness in relation to persistence over a hypergraph. With a generic type of argumentation, there are some methods that can exploit certain complex decision procedures. Argument and attack relations within the Dung AAF, thus are further defined to obtain a graphical formula of Kripke groundedness. The incorporating of multiple levels of knowledge aligns with a computational linguistics aspect for the defining of a classification criteria for AAF. In the construction, I will provide some treatment of ‘good’ model-theoretic properties that bridge AAF with Zarankiewicz’s problem to introduce how arguments are consistent with bipartite hypergraphs. The Zarankiewicz problem appears with the communication complexity on AF graphs. 
12/8/2020Cancelled
12/15/2020Clara Shikhelman (Chaincode Labs)

Video
TitleLightning Network Economics: Cost-minimal channels and their implications for network structure

Abstract: The Lightning Network is a second-layer solution built above Bitcoin, aimed to solve Bitcoin’s scalability and immediacy problems. A channel in the Lightning Network allows two parties to secure bitcoin payments and escrow holdings between them. Designed to increase transaction immediacy and reduce blockchain congestion, this has the potential to solve many issues associated with Bitcoin.
In this talk, we study the economics of the Lightning Network. We present conditions under which two parties optimally establish a channel and give explicit formulas for channels’ costs. Using these, we derive implications for the network’s structure under cooperation assumptions among small sets of users. We show both local implications, such as the wastefulness of certain structures, and global implications, such as a (low) upper bound on the Lightning Network’s average degree.

The schedule below will be updated as talks are confirmed.

Related Posts