The CMSA will host a weekly seminar running through 2020-21, where researchers from all areas of computer science present modern frontiers in computer science to the general mathematical audience. Talks shall be accessible to anyone with a good mathematical background; no domain-specific background is required. The seminar will run weekly on Tuesday 11:30 am ET.
In order to attend this seminar, please fill out this form.
For more information, contact the seminar organizer, Omri Ben-Eliezer (firstname.lastname@example.org).
|9/15/2020||Yannai A. Gonczarowski (Microsoft Research)|
|Title: The Menu-Size of Approximately Optimal Auctions|
Abstract: We consider a monopolist who wishes to sell n goods to a single additive buyer, where the buyer’s valuations for the goods are drawn according to independent distributions. It is well known that—unlike in the single item case—the revenue-optimal auction (a pricing scheme) may be complex, sometimes requiring a continuum of menu entries, that is, offering the buyer a choice between a continuum of lottery tickets. It is also known that simple auctions with a finite bounded number of menu entries (lotteries for the buyer to choose from) can extract a constant fraction of the optimal revenue, as well as that for the case of bounded distributions it is possible to extract an arbitrarily high fraction of the optimal revenue via a finite bounded menu size. Nonetheless, the question of the possibility of extracting an arbitrarily high fraction of the optimal revenue via a finite menu size, when the valuation distributions possibly have unbounded support (or via a finite bounded menu size when the support of the distributions is bounded by an unknown bound), remained open since the seminal paper of Hart and Nisan (2013), and so has the question of any lower-bound on the menu-size that suffices for extracting an arbitrarily high fraction of the optimal revenue when selling a fixed number of goods, even for two goods and even for i.i.d. bounded distributions.
In this talk, we resolve both of these questions. We first show that for every n and for every ε>0, there exists a menu-size bound C=C(n,ε) such that auctions of menu size at most C suffice for extracting a (1-ε) fraction of the optimal revenue from any valuation distributions, and give an upper bound on C(n,ε), even when the valuation distributions are unbounded and nonidentical. We then proceed to giving two lower bounds, which hold even for bounded i.i.d. distributions: one on the dependence on n when ε=1/n and n grows large, and the other on the dependence on ε when n is fixed and ε grows small. Finally, we apply these upper and lower bounds to derive results regarding the deterministic communication complexity required to run an auction that achieves such an approximation.
|9/22/2020||Daniel Goldberg (NextSilicon)||Title: Cybersecurity research in the wild|
Abstract: Cybersecurity research exhibits classic yet complex challenges, melding together cryptography, programming language design, and computational complexity, along with psychology and industrial design.
One example of these challenges is crafting an expressive yet safe programming language. SQL — the most popular database querying language — is, however, far from being safe; its expressiveness and lack of care in design result in countless SQL injection attacks to this day. The approaches to mitigating this design flaw vary between academia and industry and involve a mixture of graph analysis, system engineering and new designs of programming interfaces.
During this talk I will review the different participants in frontier research: industry, academia, nationstates and hobbyists. The first part of the talk will focus on these participants and their incentives, while the second part will contrast how academia is approaching them compared to industry and nationstates.
|9/29/2020||Rajesh Jayaram (CMU)|
|Title: Testing Positive Semi-Definiteness via Random Submatrices|
Abstract: Given an n x n matrix A with bounded entries, we study the problem of testing whether A is positive semi-definite (PSD) via only a small number of queries to the entries of A. While in general one must read the entire matrix A to determine if it is PSD, we demonstrate that testing whether A is PSD or “far” from PSD (under some norm) is indeed possible with query complexity which only depends on the distance to the PSD cone. This setup is commonly referred to as the property testing framework. We consider two natural norms of n x n matrices: the spectral norm and the Frobenius (Euclidean) norm. We give a query-optimal algorithm for the former, and present new upper and lower bounds for the latter.
Both of these algorithms have a very simple structure: they randomly sample a collection of principal submatrices and check whether these submatrices are PSD. Thus, our problem can phrased purely as a question in random matrix theory: given a (entry-wise bounded) matrix A which is at distance D (in some norm) from the PSD cone, what is the probability that a random k x k principal submatrix is not PSD? For the spectral norm, this problem can be tightly answered by classical arguments (e.g. scalar valued concentration), however the case of the Euclidean norm appears to be more involved, and will require matrix concentration based arguments.
In this talk, we will discuss the analysis of eigenvalues of random submatrices which lead to these algorithms, and touch on several open questions related to spectral concentration of random submatrices.
Joint work with Ainesh Bakshi and Nadiia Chepurko.
Talk based on the paper https://arxiv.org/abs/2005.06441, to appear in FOCS 2020.
|10/6/2020||Hadar Averbuch-Elor (Cornell Tech)||Title: Generation by Decomposition|
Abstract: Deep learning has revolutionized our ability to generate novel images and 3D shapes. Typically neural networks are trained to map a high-dimensional latent code to full realistic samples. In this talk, I will present two recent works focusing on generation of handwritten text and 3D shapes. In these works, we take a different approach and generate image and shape samples using a more granular part-based decomposition, demonstrating that the whole is not necessarily “greater than the sum of its parts”. I will also discuss how our generation by decomposition approach allows for a semantic manipulation of 3D shapes and improved handwritten text recognition performance.
|10/20/2020||Rani Hod (Tel Aviv University)|
|Title: Improved Lower Bounds for the Fourier Entropy/Influence Conjecture via Lexicographic Functions|
Abstract: Every Boolean function can be uniquely represented as a multilinear polynomial. The entropy and the total influence are two ways to measure the concentration of its Fourier coefficients, namely the monomial coefficients in this representation: the entropy roughly measures their spread, while the total influence measures their average level. The Fourier Entropy/Influence conjecture of Friedgut and Kalai from 1996 states that the entropy to influence ratio is bounded by a universal constant C.
Using lexicographic Boolean functions, we present three explicit asymptotic constructions that improve upon the previously best known lower bound C > 6.278944 by O’Donnell and Tan, obtained via recursive composition. The first uses their construction with the lexicographic function 𝓁⟨2/3⟩ of measure 2/3 to demonstrate that C >= 4+3 log_4 (3) > 6.377444. The second generalizes their construction to biased functions and obtains C > 6.413846 using 𝓁⟨Φ⟩, where Φ is the inverse golden ratio. The third, independent, construction gives C > 6.454784, even for monotone functions.
Beyond modest improvements to the value of C, our constructions shed some new light on the properties sought in potential counterexamples to the conjecture.
Additionally, we prove a Lipschitz-type condition on the total influence and spectral entropy, which may be of independent interest.
|10/27/2020||Vaggos Chatziafratis (Google Research)|
|Title: Depth-Width Trade-offs for Neural Networks through the lens of Dynamical Systems|
Abstract: How can we use the theory of dynamical systems in analyzing the capabilities of neural networks? Understanding the representational power of Deep Neural Networks (DNNs) and how their structural properties (e.g., depth, width, type of activation unit) affect the functions they can compute, has been an important yet challenging question in deep learning and approximation theory. In a seminal paper, Telgarsky reveals the limitations of shallow neural networks by exploiting the oscillatory behavior of a simple triangle function and states as a tantalizing open question to characterize those functions that cannot be well-approximated by small depths.
In this work, we point to a new connection between DNNs expressivity and dynamical systems, enabling us to get trade-offs for representing functions based on the presence of a generalized notion of fixed points, called periodic points that have played a major role in chaos theory (Li-Yorke chaos and Sharkovskii’s theorem). Our main results are general lower bounds for the width needed to represent periodic functions as a function of the depth, generalizing previous constructions relying on specific functions.
Based on two recent works:
with Ioannis Panageas, Sai Ganesh Nagarajan, Xiao Wang from ICLR’20 (spotlight): https://arxiv.org/abs/1912.04378
with Ioannis Panageas, Sai Ganesh Nagarajan from ICML’20: https://arxiv.org/abs/2003.00777
|11/3/2020||Alaa Maalouf & Ibrahim Jubran (Haifa University)|
|Title: Fast and Accurate Least-Mean-Squares Solvers|
Abstract: Least-mean squares (LMS) solvers such as Linear / Ridge / Lasso-Regression, SVD and Elastic-Net not only solve fundamental machine learning problems, but are also the building blocks in a variety of other methods, such as decision trees and matrix factorizations.
We suggest an algorithm that gets a finite set of $n$ $d$-dimensional real vectors and returns a weighted subset of $d + 1$ vectors whose sum is exactly the same. The proof in Caratheodory’s Theorem (1907) computes such a subset in $O(n^2 d^2 )$ time
and thus not used in practice. Our algorithm computes this subset in $O(nd)$ time, using $O(logn)$ calls to Caratheodory’s construction on small but “smart” subsets. This is based on a novel paradigm of fusion between different data summarization techniques, known as sketches and coresets.
As an example application, we show how it can be used to boost the performance of existing LMS solvers, such as those in scikit-learn library, up to $x100$. Generalization for streaming and distributed (big) data is trivial. Extensive experimental results and complete open source code are also provided.
|11/10/2020||Keyulu Xu (MIT)|
|Title: Graph Neural Networks: Expressive Power, Generalization, and Extrapolation|
Abstract: Recent advances in deep learning exploit the structure in data and architectures. Graph Neural Network (GNN) is a powerful framework for learning with graph-structured objects, and for learning the interaction of objects on a graph. Applications include recommender systems, drug discovery, physical and visual reasoning, program synthesis, and natural language processing.
In this talk, we study GNNs from the following aspects: expressive power, generalization, and extrapolation. We characterize the expressive power of GNNs from the perspective of graph isomorphism tests. We show an upper bound that GNNs are at most as powerful as a Weisfeiler-Lehman test. We then show conditions to achieve this upper bound, and present a maximally powerful GNN. Next, we analyze the generalization of GNNs. The optimization trajectories of over-parameterized GNNs trained by gradient descent correspond to those of kernel regression using a specific graph neural tangent kernel. Using this relation, we show GNNs provably learn a class of functions on graphs. More generally, we study how the architectural inductive biases influence generalization in a task. We introduce an algorithmic alignment measure, and show better alignment implies better generalization. Our framework suggests GNNs can sample-efficiently learn dynamic programming algorithms. Finally, we study how neural networks trained by gradient descent extrapolate, i.e., what they learn outside the support of the training distribution (e.g., on larger graphs or edge weights). We prove a linear extrapolation behavior of ReLU multilayer perceptrons (MLPs), and identify conditions under which MLPs and GNNs extrapolate well. Our results suggest how a good representation or architecture can help extrapolation.
Talk based on:
https://arxiv.org/abs/1810.00826 ICLR’19 (oral)
https://arxiv.org/abs/1905.13211 ICLR’20 (spotlight)
|11/17/2020||Daniel Alabi (Harvard)|
|Title: Differentially Private Simple Linear Regression|
Abstract: Economics and social science research often require analyzing datasets of sensitive personal information at fine granularity, with models fit to small subsets of the data. Unfortunately, such fine-grained analysis can easily reveal sensitive individual information. We study algorithms for simple linear regression that satisfy differential privacy, a constraint which guarantees that an algorithm’s output reveals little about any individual input data record, even to an attacker with arbitrary side information about the dataset. We consider the design of differentially private algorithms for simple linear regression for small datasets, with tens to hundreds of datapoints, which is a particularly challenging regime for differential privacy. Focusing on a particular application to small-area analysis in economics research, we study the performance of a spectrum of algorithms we adapt to the setting. We identify key factors that affect their performance, showing through a range of experiments that algorithms based on robust estimators (in particular, the Theil-Sen estimator) perform well on the smallest datasets, but that other more standard algorithms do better as the dataset size increases. See https://arxiv.org/abs/2007.05157 for more details.
Joint work with Audra McMillan, Jayshree Sarathy, Adam Smith, and Salil Vadhan.
If time permits, I will chronicle past work on differentially private linear regression, discussing previous works on distributed linear regression and hypothesis testing in the general linear model.
|11/24/2020||Kiril Solovey (Stanford)||Title: Large-scale multi-robot systems: From algorithmic foundations to smart-mobility applications|
Abstract: Multi-robot systems are already playing a crucial role in manufacturing, warehouse automation, and natural resource monitoring, and in the future they will be employed in even broader domains from space exploration to search-and-rescue. Moreover, these systems will likely be incorporated in our daily lives through drone delivery services and smart mobility systems that comprise of thousands of autonomous vehicles. The anticipated benefits of multi-robot systems are numerous, ranging from automating dangerous jobs, to broader societal facets such as easing traffic congestion and sustainability. However, to reap those rewards we must develop control mechanisms for such systems that can adapt rapidly to unexpected changes on a massive scale. Importantly, these mechanisms must capture: (i) dynamical and collision-avoidance constraints of individual robots; (ii) interactions between multiple robots; and (iii) more broadly, the interaction of those systems with the environment. All these considerations give rise to extremely complex and high-dimensional optimization problems that need to be solved in real-time.
In this talk I will present recent progress on the design of algorithms for control and decision-making to allow the safe, effective, and societally-equitable deployment of multi-robot systems. I will highlight both results on fundamental capabilities for multi-robot systems (e.g., motion planning and task allocation), as well as applications in smart mobility, including multi-drone delivery and autonomous mobility-on-demand systems. Along the way, I will mention a few related open problems in mathematics and algorithm design.
|12/1/2020||Joseph Dimos (AxiomaVox)||TBA|
The schedule below will be updated as talks are confirmed.