Minimizing latency in data acquisition, distributed processing, storage and retrieval
Abstract
Achieving low latency is of utmost importance in applications demanding real-time sensing
and control as in cyber-physical systems. In this thesis, we explore three different facets of
ensuring low latency in such systems. First part of this manuscript considers the design of
an encoder-decoder system to facilitate real-time tracking of a physical process modelled as a
first order auto-regressive process. Samples from a high-dimensional first-order auto-regressive
process generated by an independently and identically distributed random innovation sequence
are observed by a sender which can communicate only finitely many bits per unit time to a
receiver. The receiver seeks to form an estimate of the process value at every time instant in
real-time. We consider a time-slotted communication model in a slow-sampling regime where
multiple communication slots occur between two sampling instants. We propose a successive
update scheme which uses communication between sampling instants to refine estimates of the
latest sample and study the following question: Is it better to collect communication of multiple
slots to send better refined estimates, making the receiver wait more for every refinement, or to
be fast but loose and send new information in every communication opportunity? We show that
the fast but loose successive update scheme with ideal spherical codes is universally optimal
asymptotically for a large dimension. However, most practical quantization codes for fixed
dimensions do not meet the ideal performance required for this optimality, and they typically
will have a bias in the form of a fixed additive error. Interestingly, our analysis shows that the
fast but loose scheme is not an optimal choice in the presence of such errors, and a judiciously
chosen frequency of updates outperforms it.
Next part considers designing load balancing policies that ensures low latency without needing
extra overhead in terms of server-side feedback or coordination among the servers. Dispatching
policies such as the join shortest queue (JSQ), join smallest work (JSW) and their power
of two variants are used in load balancing systems where the instantaneous queue length or
workload information at all queues or a subset of them can be queried. In situations where the
dispatcher has an associated memory, one can minimize this query overhead by maintaining a
list of idle servers to which jobs can be dispatched. Recent alternative approaches that do not require querying such information include the cancel on start and cancel on complete based
replication policies. The downside of such policies however is that the servers must communicate
the start or completion of each service to the dispatcher and must allow cancellation
of redundant copies. In practice, the requirements of query messaging, memory, and replica
cancellation pose challenges in their implementation and their advantages are not clear. We
consider load balancing policies that do not query load information, do not have a memory,
and do not cancel replicas. Surprisingly, we were able to identify operating regimes where such
policies have better performance when compared to policies that utilize server feedback information.
Our policies allow the dispatcher to append a timer to each job or its replica. A job
or a replica is discarded if its timer expires before it starts getting served. We analyze several
variants of this policy which are novel, simple to implement, and also have remarkably good
performance, despite no feedback from servers to the dispatcher.
Finally, we consider the setting of a distributed storage system where a single file is subdivided
into smaller fragments of same size which are then replicated with a common replication
factor across servers of identical cache size. An incoming file download request is sent to all
the servers, and the download is completed whenever the request gathers all the fragments. At
each server, we are interested in determining the set of fragments to be stored, and the sequence
in which fragments should be accessed, such that the mean file download time for a request
is minimized. We model the fragment download time as an exponential random variable independent
and identically distributed for all fragments across all servers, and show that the mean
file download time can be lower bounded in terms of the expected number of useful servers
summed over all distinct fragment downloads. We present deterministic storage schemes that
attempt to maximize the number of useful servers. We show that finding the optimal sequence
of accessing the fragments is a Markov decision problem, whose complexity grows exponentially
with the number of fragments. We propose heuristic algorithms that determine the sequence
of access to the fragments which are empirically shown to perform well.