# Workshop on Coding and Information Theory

The workshop on coding and information theory will take place April 9-13, 2018 at the Center of Mathematical Sciences and Applications, located at 20 Garden Street, Cambridge, MA.

This workshop will focus on new developments in coding and information theory that sit at the intersection of combinatorics and complexity, and will bring together researchers from several communities — coding theory, information theory, combinatorics, and complexity theory — to exchange ideas and form collaborations to attack these problems.

Squarely in this intersection of combinatorics and complexity, locally testable/correctable codes and list-decodable codes both have deep connections to (and in some cases, direct motivation from) complexity theory and pseudorandomness, and recent progress in these areas has directly exploited and explored connections to combinatorics and graph theory.  One goal of this workshop is to push ahead on these and other topics that are in the purview of the year-long program.  Another goal is to highlight (a subset of) topics in coding and information theory which are especially ripe for collaboration between these communities.  Examples of such topics include polar codes; new results on Reed-Muller codes and their thresholds; coding for distributed storage and for DNA memories; coding for deletions and synchronization errors; storage capacity of graphs; zero-error information theory; bounds on codes using semidefinite programming; tensorization in distributed source and channel coding; and applications of information-theoretic methods in probability and combinatorics.  All these topics have attracted a great deal of recent interest in the coding and information theory communities, and have rich connections to combinatorics and complexity which could benefit from further exploration and collaboration.

Participation: The workshop is open to participation by all interested researchers, subject to capacity.

A list of lodging options convenient to the Center can also be found on our recommended lodgings page.

Confirmed participants include:

Co-organizers of this workshop include Venkat Guruswami, Alexander Barg, Mary Wootters.  More details about this event, including participants, will be updated soon.

Schedule:

Monday, April 9

 Time. Speaker Title/Abstract 9:00 – 9:30am Breakfast 9:30 – 10:00am Noga Ron-Zewi Title: Improved list-decoding and local-list-decoding of algebraic codes Abstract: We show new and improved error-correcting properties of folded Reed-Solomon codes and multiplicity codes. Both of these families of codes are based on polynomials over finite fields, and both have been the sources of recent advances in coding theory. Folded Reed-Solomon codes were the first explicit constructions of codes known to achieve list-decoding capacity; multivariate multiplicity codes were the first constructions of high-rate locally correctable codes; and univariate multiplicity codes are also known to achieve list-decoding capacity. However, previous analyses of the error-correction properties of these codes did not yield optimal results. In particular, in the list-decoding setting, the guarantees on the list-sizes were polynomial in the block length, rather than constant; and for multivariate multiplicity codes, local list-decoding algorithms could not go beyond the Johnson bound. We show that Folded Reed-Solomon codes and multiplicity codes are in fact better than previously known in the context of (local) list-decoding, improving upon a long line of previous work. More precisely, we first show that (a) Folded RS codes achieve list-decoding capacity with constant list sizes, independent of the block length; and (b) high-rate univariate multiplicity codes can also be list-recovered with constant list sizes. Using our result on univariate multiplicity codes, we show that (c) high-rate multivariate multiplicity codes are locally list-recoverable codes. Finally, using some standard tools, we can modify these algebraic constructions to obtain capacity-achieving locally-list-decodable codes with better query complexity than was previously known. Joint work with Swastik Kopparty, Shubhangi Saraf, and Mary Wootters. 10:00 – 10:30am Dean Doron Title: Near-Optimal Erasure List-Decodable Codes Abstract:For every small ε, we explicitly construct binary erasure list-decodable codes that can be list-decoded from 1-ε fraction of erasures, with near-optimal list-size poly(log(1/ε)) and near-optimal rate O(ε^(1+γ)) where γ>0 is any constant. This is the first construction to break the ε^2 rate barrier, solving a long-standing open problem raised by Guruswami and Indyk, and it does so with a remarkably small list size (we remark that Guruswami showed such a result is impossible using linear codes, and indeed our code is not linear). Equivalently, in terms of dispersers, we construct ε-error one-bit strong dispersers with near-optimal seed-length and near-optimal entropy-loss.The main ingredient in the construction is a new (and nearly optimal) unbalanced two-source extractor. The extractor extracts one bit with constant error from two independent sources, where one source has length n and tiny min-entropy O(loglog n)  and the other source has length O(log n) and arbitrarily small constant min-entropy rate. Joint work with Avraham Ben-Aroya and Amnon Ta-Shma.Slides 10:30 – 11:10am Break 11:10 – 12:00pm Young-Han Kim Title: Elements of Index Coding Abstract: Originally introduced to minimize the number of transmissions in satellite communication, the index coding problem provides a simple yet rich model for several important engineering problems in network communication, such as content broadcasting, peer-to-peer communication, distributed caching, device-to-device relaying, and interference management. This talk aims to provide a broad overview of this fascinating problem — arguably one of the most important open problems in network information theory — in four parts.  First, we give a mathematical description of the problem and introduce basic properties of optimal index codes. Second, we discuss several coding schemes based on algebraic, graph-theoretic, and information-theoretic tools. Third, we establish performance bounds and discuss their implications in network information theory. Fourth, we explore the relationship between index coding and other problems such as network coding, distribute storage, and guessing games. No prior exposure to network information theory is assumed.Slides 12:00 – 1:30pm Lunch 1:30 – 2:00pm Navin Kashyap Indian Institute of Science Title: Probabilistic Forwarding Over Networks: To Code Or Not To Code? Abstract:  Consider a network (i.e., a connected graph) with a large number of nodes, one of which is distinguished as a source node. The source node has $k$ data packets to be broadcast to all nodes in the network. The $k$ data packets are encoded into $n \ge k$ coded packets (using an MDS code over a large enough field), such that any $k$ of the $n$ coded packets are sufficient to recover the original $k$ data packets. The source transmits all $n$ packets to its one-hop neighbours. Thereafter, the nodes follow a probabilistic forwarding protocol: Each node, upon receiving a particular coded packet for the first time, forwards it to all its one-hop neighbours only with a certain probability $p$. This probabilistic forwarding happens independently across all nodes and packets. We declare such a probabilistic forwarding protocol to be useful if the expected fraction of network nodes that receive at least $k$ of the $n$ coded packets is at least $1-\delta$ for some small $\delta > 0$. We would of course like to operate the protocol at the smallest value of the probability $p$ that makes the protocol useful, as this would minimize the expected total number of transmissions across all network nodes. It turns out that the redundancy $\rho := (n-k)/k$ introduced by the coding scheme significantly influences this minimum expected number of transmissions. Simulation results indicate that over network topologies that are tree-like, the introduction of redundancy in the form of coding is not beneficial; but over topologies that are well-connected, the introduction of some (but not too much) redundancy can significantly reduce the expected total number of transmissions needed. We describe our preliminary analysis of this phenomenon using ideas from percolation theory. This talk is based on work with B.R. Vinay Kumar and Roshan Antony.Slides 2:00 – 2:30pm Chandra Nair TItle: Observations regarding extremizers of some non-convex optimization problems in information theory and some open questions motivated by them Abstract: Rate regions in network information theory have long been characterized using auxiliary random variables. In the last few years, optimality or sub-optimality of some fundamental rate regions have been established by computing the optimal extremal auxiliaries, i.e. by computing the optimizers that belong to a family of non-convex optimization problems. These results and computations seem to reveal an intriguing relationship between “local tensorization” and “global tensorization”. Indeed several of the counterexamples to various optimality questions have been deduced based on the above intuitive connection. In this talk, I will try to outline these observations, and formally state various observations as open problems. I will also talk about a potential pathway (verified on small examples by simulations) that supports this connection as well as potential consequences, such as (potentially efficient) computation of hypercontractivity parameters.Slides 2:30 – 3:30pm Break 3:30 – 4:30pm Amnon Ta-Shma Title: Explicit, Epsilon-Balanced Codes Close to the GV Bound I will show an explicit construction of a binary error correcting code with relative distance 1/2-epsilon and relative rate epsilon^{2+o(1)}. This comes close to the Gilbert-Varshamov and the LP lower bound. Previous explicit constructions had rate about epsilon^3, and this is the first explicit construction to get that close to the Gilbert-Varshamov bound. Technically, we define Parity Samplers. A parity sampler is a collection of sets {S_i} with the property that for every “test” set A of a given constant density epsilon_0, the probability a set S_i from the collection falls into the test set A an even number of times is about half. A sparse parity sampler immediately implies a good code with distance close to half. The complete t-complex of all sequences of cardinality t is a good parity sampler, but with too many sets in the collection. Rozenman and Wigderson, and independently Alon, used random walks over expanders to explicitly construct sparse parity samplers, and their construction implies explicit codes with relative rate epsilon^4. I will show how one can get better explicit parity samplers (and therefore also better explicit codes) using a variant of the zig-zag product. In the random walk sampler, there exist many sets with substantial overlap. One way to look at the zig-zag product is that it takes a sub-collection of the random walk sampler, and this sub-collection has a smaller overlap between sets in the collection. The zig-zag product achieves that by keeping a small internal state. I will show that by enlarging the internal state one can further reduce the overlap, and as a result improve the quality of the parity sampler. 4:30 – 6:00pm Welcome Reception

Tuesday, April 10

 Time Speaker Title/Abstract 9:00 – 9:30am Breakfast 9:30 – 10:00am Himanshu Tyagi Title: Distributed independence testing and a new question related to hypercontractivity Abstract: Two parties observing sequences of unbiased, random bits seek to determine if the bits are independent or have a specified correlation.  A trivial scheme will entail one party sending its bits to the other who in turn will decide if the bits are independent or correlated. Can any other scheme accomplish the independence test by using less communication than the trivial scheme?  We exhibit a one-way communication scheme building on linear correlation that indeed outperforms the trivial scheme. In fact, our scheme extends to an arbitrary distribution. Furthermore, we establish a lower bound for one-way communication required to enable independence testing, which is tight for the cases of binary symmetric distribution and Gaussian symmetric distribution. Our lower bound leverages hypercontractivity to obtain a measure change bound, but works only for one-way communication protocols. Its extension to multiround protocols opens up new interesting questions related to hypercontractivity. 10:00 – 10:30am Alex Samorodnitsky Title: On the entropy of a noisy function. Abstract:Let X be a uniformly distributed binary sequence of length n. Let Y be a noisy version of X, obtained by flipping each coordinate of X independently with probability epsilon. We want to come up with a one-bit function of Y which provides as much information as possible about X. Courtade and Kumar conjectured that the best one can do is to choose a coordinate function f(Y) = Y_i, for some i between 1 and n. We prove the conjecture for large values of epsilon (epsilon > 1/2 – delta, for some absolute constant delta). The main new technical ingredient in the proof is the claim that if F is a real-valued function on the boolean cube, and G is a noisy version of F, then the entropy Ent(G)  is upper-bounded by the expected entropy of a projection of F on a random coordinate subset of a certain size. 10:30 – 11:10am Break 11:10 – 12:00pm Maxim Raginsky Title: Algorithmic stability and generalization in machine learning: an information-theoretic analysis Abstract: Machine learning algorithms can be viewed as stochastic transformations (or channels, in information-theoretic parlance) that map training data to hypotheses. Following the classic paper of Bousquet and Elisseeff, we say that such an algorithm is stable if its output does not depend too much on any individual training example. Since stability is closely connected to generalization capabilities of learning algorithms, it is of theoretical and practical interest to obtain sharp quantitative estimates on the generalization bias of machine learning algorithms in terms of their stability properties. In this talk, based on joint work with Aolin Xu and Sasha Rakhlin, I will discuss several information-theoretic measures of algorithmic stability based on mutual information and erasure mutual information, and illustrate their use for upper-bounding the generalization bias of learning algorithms 12:00 – 1:30pm Lunch 1:30 – 2:00pm Tom Gur Title: Quantum zero knowledge via local codes Abstract: The seminal work of Ben-Or et al. (STOC 1988) shows that zero knowledge can be achieved unconditionally by making the physical assumption that spatially-isolated provers are playing independent strategies. In this talk we will discuss new techniques involving locally testable/correctable codes that allow us to strengthen the foregoing result to hold even in light of quantum mechanics, which tells us that spatially-isolated provers can realize non-local correlated strategies by sharing a quantum entangled state. These techniques include structural results regarding subcube sums of the Reed-Muller code. We provide the first construction of a zero-knowledge proof system that is sound against quantum entangled provers. Along the way, we introduce a framework that leverages local testability to “lift’’ classical protocols to quantum protocols in a blackbox manner. The talk is self-contained and does not assume any quantum or complexity preliminaries. Joint work with Alessandro Chiesa, Michael Forbes, and Nicholas Spooner.Slides 2:00 – 2:30pm Nicolas Resch Title: Lossless Dimension Expanders via Linearized Polynomials and Subspace Designs Abstract: Dimension expanders, which are a linear-algebraic analog of expander graphs, consist of a collection of linear maps A_1,…,A_d: F^n \to F^n such that, for any subspace U of bounded size, the image of U under all the maps, i.e., A_1(U)+…+A_d(U), has dimension larger than U by a constant factor. A standard application of the probabilistic method demonstrates that dimension expanders with constant d exist, and the main challenge is obtaining an explicit construction. We provide an explicit construction of a dimension expander over linear-sized finite fields. Moreover, our construction is lossless in the sense that the expansion factor is (1-\epsilon)d. Our approach crucially uses subspace designs and is inspired by recent constructions of list-decodable rank-metric codes. Joint work with Venkat Guruswami and Chaoping Xing. Slides 2:30 – 3:30pm Break 3:30 – 4:30pm Open problem session

Wednesday, April 11

 Time Speaker Title/Abstract 9:00 – 9:30am Breakfast 9:30 – 10:00am Amirbehshad Shahrasbi Title: Synchronization Strings Abstract: The talk will give an introduction to synchronization strings which provide a novel way of efficiently reducing synchronization errors, such as insertions and deletions, to much more benign and better understood Hamming errors. Synchronization strings have many applications. The talk will focus on using synchronization strings as a new way to generate efficient error correcting block codes for insertions and deletions. In particular, codes that approach the Singleton bound, i.e., for any 0 < delta < 1 and any eps > 0 these codes achieve a rate of 1 – delta – eps while being able to efficiently decode from a delta fraction of insertions and deletions.Further applications of synchronization strings will also be briefly discussed including a general method of simulating symbol corruption channels over any given insertion-deletion channel, an efficient and near-optimal coding scheme for interactive communication over insertion-deletion channels, and list-decodable insertion-deletion codes.This talk is based on a joint work with Bernhard Haeupler.Slides 10:00 – 10:30am Mahdi Cheraghchi Title. Capacity Upper Bounds for Deletion-type Channels Abstract. We develop a systematic approach, based on convex programming and real analysis, for obtaining upper bounds on the capacity of the binary deletion channel and, more generally, channels with i.i.d. insertions and deletions. Other than the classical deletion channel, we give a special attention to the Poisson-repeat channel introduced by Mitzenmacher and Drinea (IEEE Transactions on Information Theory, 2006).Our framework can be applied to obtain capacity upper bounds for any repetition distribution (the deletion and Poisson-repeat channels corresponding to the special cases of Bernoulli and Poisson distributions). Our techniques essentially reduce the task of proving capacity upper bounds to maximizing a univariate, real-valued, and often concave function over a bounded interval. The corresponding univariate function is carefully designed according to the underlying distribution of repetitions and the choices vary depending on the desired strength of the upper bounds as well as the desired simplicity of the function (e.g., being only efficiently computable versus having an explicit closed-form expression in terms of elementary, or common special, functions). Among our results, we show the following: 1. The capacity of the binary deletion channel with deletion probability $d$ is at most $(1-d) \log \varphi$ for $d \geq 1/2$, and, assuming the capacity function is convex, is at most $1-d \log(4/\varphi)$ for $d<1/2$, where $\varphi=(1+\sqrt{5})/2$ is the golden ratio. This is the first nontrivial capacity upper bound for any value of $d$ outside the limiting case $d \to 0$ that is fully explicit and proved without computer assistance. 2. We derive the first set of capacity upper bounds for the Poisson-repeat channel. Our results uncover further striking connections between this channel and the deletion channel, and suggest, somewhat counter-intuitively, that the Poisson-repeat channel is actually analytically simpler than the deletion channel and may be of key importance to a complete understanding of the deletion channel. 3. We derive several novel upper bounds on the capacity of the deletion channel. All upper bounds are maximums of efficiently computable, and concave, univariate real functions over a bounded domain. In turn, we upper bound these functions in terms of explicit elementary and standard special functions, whose maximums can be found even more efficiently (and sometimes, analytically, for example for $d=1/2$). Along the way, we develop several new techniques of potentially independent interest. (Based on https://arxiv.org/abs/1711.01630)Slides 10:30 – 11:10am Break 11:10 – 12:00pm Yuval Peres Title: Iterated von-Neumann unbiasing and Trace reconstruction for the deletion channel Abstract: In the first part of the talk, I will describe the connections between an efficient unbiasing method from 1992 (Iterated von-Neumann unbiasing), and polar codes. In the second part, I will discuss the trace reconstruction problem: Suppose that an unknown string $x$ of $n$ bits is observed through  the deletion channel; how many independent outputs (traces) of this channel are needed to reconstruct $x$ with high probability? The best lower bound known is linear in $n$. Currently, the best upper bound is exponential in the cube root of $n$, proved with with Fedor Nazarov, STOC 2017 (Similar results were obtained independently by De, O’Donnell and Servedio). This upper bound is sharp for tests that only use linear combinations of the output.  If the string $x$ is random, we show that a subpolynomial number of traces suffices, by comparison to a random walk. Joint works with Alex Zhai (FOCS 2017) and with Nina Holden & Robin Pemantle, preprint (2017). Slides 12:00 – 1:30pm Lunch 1:30 – 2:00pm Itzhak Tamo Title: Detecting and Correcting Errors in Codes over Graphs Abstract: Let G be a graph on n vertices and C be an (n, k, d) code over an alphabet X. Assume that vertex i stores a symbol c_i, where the vector of stored symbols (c_1,…,c_n) forms a codeword of the code C. We assume that two vertices can communicate if there is a communication link (edge) connecting them.  In this talk we consider the communication complexity of error detection and correction of the information stored over the vertices of the graph. For error detection, we obtain general lower bounds for the communication complexity as functions of n, k, d, |X|, which are tight for several graphs and codes. For error correction, building on the work of Alon, Efremenko and Sudakov, we design a protocol which can efficiently correct a single input error for repetition codes. We conclude with some interesting problems for further research. Joint work with Chong Shangguan 2:00 – 2:30pm Arya Mazumdar Title: Open problems in locally recoverable codes Abstract:  Locally repairable codes (LRCs) have recently been a subject of much research interest due to their theoretical appeal and application in distributed storage systems. In an LRC, any coordinate of a codeword can be recovered by accessing only few other coordinates. In this talk we describe three aspects of LRCs and some open problems. Namely, we look into the rate-distance trade-off of LRCs, the problem of disjoint repair groups, and the capacity of LRCs. 2:30 – 3:15pm Break 3:15 – 3:45pm Sankeerth Rao Title: The independence number of the Birkhoff polytope graph, and applications to maximally recoverable codes. Abstract: Maximally recoverable codes are codes designed for distributed storage which combine quick recovery from single node failure and optimal recovery from catastrophic failure. Gopalan et al [SODA 2017] studied the alphabet size needed for such codes in grid topologies and gave a combinatorial characterization for it. Consider a labeling of the edges of the complete bipartite graph $K_{n,n}$ with labels coming from $F_2^d$ , that satisfies the following condition: for any simple cycle, the sum of the labels over its edges is nonzero. The minimal $d$ where this is possible controls the alphabet size needed for maximally recoverable codes in $n × n$ grid topologies. Prior to the current work, it was known that $d$ is between (log n)^2 and $nlog n$. We improve both bounds and show that $d$ is linear in $n$. The upper bound is a recursive construction which beats the random construction. The lower bound follows by first relating the problem to the independence number of the Birkhoff polytope graph, and then providing tight bounds for it using the representation theory of the symmetric group. Slides 3:45 – 4:15pm Sivakanth Gopi Title: Maximally Recoverable Local Reconstruction Codes Abstract: Protecting huge amounts of data stored in the cloud from server crashes resulted in distributed storage systems transitioning to erasure coding based schemes. Local Reconstruction Codes (LRCs) have emerged as the codes of choice for these applications. They allow super fast recovery in the typical case of a small number of crashes by reading only a few number of healthy servers (called ‘locality’), while still protecting the data from the unlikely event of a large number of crashes. They have many good properties of simply replicating data while being much more storage efficient and crash resilient. Maximally Recoverable LRCs (MR LRCs) are information theoretically the most optimal LRCs; they can recover from every feasible crash scenario for a given storage efficiency and locality. MR LRCs have already been deployed in Microsoft storage systems, outperforming traditional erasure coding systems. Designing such codes over small finite fields is crucial for applications. Unfortunately, we are far from understanding the minimal field size required for these codes. In this talk, we prove the first polynomially growing lower bounds on field size for MR LRCs using an interesting connection to incidence geometry, prior to our work no super linear lower bounds were known. We also present some linear field size MR LRC constructions in some parameter ranges which are very relevant in practice.Slides

Thursday, April 12

 Time Speaker Title/Abstract 9:00 – 9:30am Breakfast 9:30 – 10:00am Boris Bukh Title: Small antipodal spherical codes Abstract: We prove sharp bounds on the distance of antipodal spherical codes in R^d with 2d+2k points for k=1,2,3,7,23. In addition, we obtain asymptotics for general k, and solve the analogous problem for spherical codes in C^d. Instead of linear programming, we use a new approach that relies on bounding the first moment of isotropic measures. Joint work with Chris Cox.Slides 10:00 – 10:30am William Martin Title: Problems on spherical codes motivated by quantum information theory Abstract: A pair B, B’ of orthonormal bases in R^d or C^d are said to be unbiased if || is constant for b in B and b’ in B’.  In most cases, finding the maximum number of pairwise unbiased bases (or mutually unbiased bases, “MUBs”) appears to be a hard problem. The real and complex cases are quite different. The same dichotomy arises in finding the maximum possible number of equiangular lines in  R^d or C^d. One may also ask for the maximum number of linked simplices in R^d: full-dimensional simplices with vertices on the unit sphere such that only two possible angles occur between vectors in distinct simplices. In this talk, I will briefly introduce the problems in quantum information theory which gave rise to the complex versions of these questions. I will discuss connections to association schemes, coding theory and finite geometry and I will summarize recent work of my student, Brian Kodalen.Slides 10:30 – 11:10am Break 11:10 – 12:00pm Emmanuel Abbe Title: The stochastic block model: where we stand Abstract: The talk overviews some of the recent developments on the stochastic block model, staring with the two community case, and extending to multiple communities. We will cover various recovery requirements, phase transitions, algorithms and the information-computation gap. Time permitting, geometric block models and open problems will be discussed. 12:00 – 1:30pm Lunch 1:30 – 2:00pm Olgica Milenkovic Title: Leveraging data popularity in distributed storage systems via MinMax Steiner systems This is a joint work with Charles Colbourn and Hoang Dau. 2:00 – 2:30pm Simeon Ball Title: On Maximum Distance Separable Codes Abstract: A block code C of length n, minimum distance d over an alphabet with q symbols, satises, jCj 6 qn􀀀d+1; which is known as the Singleton bound. A block code attaining this bound is known as a Maximum Distance Separable code or simply an MDS code. An arc S in Fkq is a subset of vectors with the property that every subset of size k of S is a set of linearly independent vectors. Equivalently, an arc is a subset of points of PG(k 􀀀 1; q), the (k 􀀀 1)-dimensional projective space over Fq, for which every subset of k points spans the whole space. If C is a k-dimensional linear MDS code over Fq then the columns of a generator matrix for C are an arc in Fkq and vice-versa. The classical example of a linear MDS code is the Reed-Solomon code, which is the evaluation code of all polynomials of degree at most k 􀀀 1 over Fq. As an arc, the Reed-Solomon code is a normal rational curve in PG(k 􀀀 1; q). The trivial upper bound on the length n of a k-dimensional linear MDS code over Fq is n 6 q + k 􀀀 1: The (doubly-extended) Reed-Solomon code has length q + 1. The dual of a k-dimensional linear MDS code is a (n 􀀀 k)-dimensional linear MDS code, thus we can assume that k 6 1 2n and therefore that k 6 q 􀀀 1. The MDS conjecture states that if 4 6 k 6 q 􀀀2 then an MDS code has length at most q + 1. In other words, there are no better linear MDS codes than the Reed-Solomon codes. In 2012, the MDS conjecture was verified for q prime. In this talk, I will talk about various advances since then and highlight the lack of examples of known MDS codes of length more than k + 1 2q. Slides 2:30 – 3:30pm Break 3:30 – 4:30pm Rump Session

Friday, April 13