Hypothesis Testing under Communication Constraints - Theory and an Application in IoT
Abstract
Applications in the Internet of Things (IoT) often demand enabling low-compute devices to perform distributed inference and testing by communicating over a low bandwidth link. This gives rise to a plethora of new problems which may broadly be termed resource-constrained statistical inference problems. In this thesis, we consider two such problems.
In the first part of the thesis, we study the following distributed hypothesis testing problem. Two parties observing sequences of uniformly distributed bits want to determine if their bits were generated independently or not. To that end, the first party communicates to the second. A simple communication scheme involves taking as few sample bits as determined by the sample complexity of independence testing and sending it to the second party. But is there a scheme that uses fewer bits of communication than the sample complexity, perhaps by observing more sample bits? We show that the answer to this question is in the affirmative. More generally, for any given joint distribution, we present a distributed independence test that uses linear correlation between functions of the observed random variables. Furthermore, we provide lower bounds for the general setting that use hypercontractivity and reverse hypercontractivity to obtain a measure change bound between the joint and the independent distributions. The resulting bounds are tight for both a binary symmetric source and a Gaussian symmetric source. The proposed scheme is then extended to handle high dimensional correlation testing with interactive communication, wherein one party observes a Gaussian vector X and the other party observes a jointly Gaussian scalar Y, and we seek to test if the norm of the vector of correlation between X and Y exceeds a given value or is it 0. We provide corresponding lower bounds to establish the optimality of the proposed scheme. Furthermore, we derive a lower bound which implies that distributed correlation testing requires less communication than distributed estimation of the correlation vector.
In the second part of the thesis, we study streaming compression of electrical signals sampled at a very high frequency by Intelligent Electronic Devices (IEDs), enabled to capture anomalous signal behavior. Under normal operation, this oversampling is redundant and leads to excessive data being stored or transmitted. This gives rise to a new compression problem where the collected samples should be further subsampled and quantized based on the presence of an anomaly in the underlying signal. We propose an Anomaly-aware Compressive Sampler (ACS) which tests the signal for the presence of an anomaly in a block of samples, and subsamples in a hierarchical manner to retain the desired sampling rate. ACS has been designed keeping hardware constraints in mind, using integer operations, an appropriate bit-packing, a simple iterated delta filter, and a streaming data pipeline. ACS competes with the state-of-the-art algorithm for the better-behaved transmission system data from DOE/EPRI, and outperforms it significantly on real-time distribution system data recorded in our laboratory. ACS is lightweight and was implemented on an ARM processor. Further, we present a mathematical analysis of the anomaly detection module of ACS. Finally, the performance of the proposed scheme in compressing a nonstationary signal with frequency band uncertainty is studied, with a focus on the dependence of the compression ratio and reconstruction error on the oversampling rate. We modify a zero-crossings-based compression scheme proposed in literature for bandlimited signals to incorporate resolution of frequency band uncertainty using our anomaly detection procedure. While this new scheme is theoretically appealing, we point out some of its limitations when it comes to implementation.