Support Recovery from Linear Measurements: Tradeoffs in the Measurement-Constrained Regime
MetadataShow full item record
In this thesis, we study problems under the theme of discovering joint sparsity structure in a set of high-dimensional data samples from linear measurements. Our primary focus is on the regime where the number of samples available can potentially be large, but we are constrained to access very few measurements per sample. This setting can be used model high-dimensional estimation tasks in a distributed setting, where storing or communicating more measurements per sample can be expensive. We first study a basic problem in this setting -- that of support recovery from linear measurements. In this problem, a set of n samples in d-dimensional Euclidean space, each having a support of size k, is accessed through m linear measurements per sample. The goal is to recover the unknown support, given knowledge of the m-dimensional measurements and the corresponding measurement matrices. This problem, also sometimes referred to as variable selection or model selection, has been extensively studied in the signal processing and statistics literature, and finds applications in source localization, hyperspectral imaging, heavy hitters detection in networks, and feature selection in regression. It is known that if we have m=\Omega(k \log d) measurements per sample, then even a single sample is sufficient for support recovery. As such, when we have access to multiple samples, an interesting question is whether we can perform recovery with fewer than k measurements per sample. This measurement-constrained setting is relatively less explored in the literature, and the optimal sample-measurement tradeoff was unknown prior to our work. We provide a tight characterization of the sample complexity of this problem, which together with previous results in the literature gives a full understanding of the scaling laws of this problem for different values of the ratio k/m. We propose two algorithms that can perform recovery in the measurement-constrained regime, where standard algorithms fail to work. Our first algorithm is a simple closed-form variance estimation-based procedure, while our second algorithm is based on an approximate maximum likelihood procedure. We show that when m<k, the minimum number of samples required for exact support recovery with high probability scales as \Theta((k^2/m^2)\log d). Our upper bound is obtained by analyzing the closed-form estimator and holds for the class of subgaussian samples and subgaussian measurement matrices chosen independently across samples. Our lower bound construction uses Gaussian samples and Gaussian measurement matrices. In the second part of the thesis, we show that the upper bound result for the closed-form estimator extends to worst-case inputs, thus demonstrating that the case of Gaussian inputs is the hardest for this problem. In addition, our analysis reveals a phase transition for the problem of support recovery at k/m=1. In this way, our result demonstrates that when m<k, multiple measurements from the same sample are more valuable than measurements from different samples. In our analysis for this worst-case inputs setting, we provide some useful results in the form of concentration bounds for heavy-tailed random variables, which may find use in other problems as well. In the third part of the thesis, we consider the extension of the problem to the case of multiple disjoint supports, where the support of each sample is assumed to be one out of a small set of a constant number of allowed supports, each of size k. We propose a two-step algorithm for this setting, that first estimates the union of the underlying supports, and then estimates the individual supports using a spectral algorithm. We analyze this algorithm for the class of subgaussian inputs and measurement matrices, and show an upper bound of O(k^4/m^4), up to polylogarithmic factors in the problem dimensions, on the sample complexity of this problem when m<k.