Department of Computational and Data Sciences (CDS)

Department of Computational and Data Sciences (CDS) https://etd.iisc.ac.in/handle/2005/26 2025-07-30T23:13:37Z Abstractions and Optimizations for Data-driven Applications Across Edge and Cloud https://etd.iisc.ac.in/handle/2005/6182 Abstractions and Optimizations for Data-driven Applications Across Edge and Cloud Khochare, Aakash Modern data driven applications have a novel set of requirements. Advances in deep neural networks (DNN) and computer vision (CV) algorithms have made it feasible to extract meaningful insights from large-scale deployments of urban cameras and drone video feeds. These data driven applications, usually composed as workflows, tend to have high bandwidth and low latency requirements in order to extract timely results from large data sources. Other applications may necessitate the use of multiple geographically distributed resources. Such requirements may be driven by data privacy regulations such as the General Data Protection Regulation (GDPR) of the European Union, need for specialized hardware, or as a means of avoiding vendor lock-ins. To support these modern applications, a diverse computing landscape has emerged over the last decade. We have witnessed increasingly powerful Edge computing resources be available in network proximity to the data sources for these applications. The number of Cloud Service Providers (CSPs) has increased along with the regions in which they operate. And finally, the CSPs have supplemented Infrastructure as a Service (IaaS) offerings with modern serverless compute offerings which promise cost benefits as well as lower operational overheads. The availability of choices in compute resources makes it challenging for application developers to manage the lifecycle of their applications – from programming the application, to optimizing it for performance, and finally deploying it. Typically, developers rely on platforms that promise ease of programmability coupled with scalability with minimal developer effort. However, the combination of application requirements and compute resource characteristics makes it challenging for platform designers to make design choices that optimizes the application for programmability and performance. A thorough revisit of existing platforms, abstractions, and optimizations is essential for addressing these challenges. In this thesis, we tackle these challenges with three distinct but related research contributions on scalable platforms, distributed algorithms and system optimizations: (1) We propose Anveshak, a platform that provides a domain specific programming model and a distributed runtime for efficiently tracking entities in a multi-camera network; (2) We design algorithms and heuristics to solve MSP, which co-schedules the flight routes of a drone fleet to visit and record video at waypoints, and perform subsequent on-board Edge analytics; and (3) We develop XFaaS, a platform that allows “zero touch” deployment of functions and workflows across multiple clouds and Edges by automatically generating code wrappers, Cloud queues, and coordinating with the native FaaS engine of a CSP. These platforms, abstractions and optimizations solve different combinations of the problem dimensions, are motivated through real-world applications, and the solutions are validated through detailed experiments on distributed systems. Taken together, this suite of contributions addresses the key gaps highlighted in this dissertation, and help bridge the gap between modern computing resource characteristics and modern application requirements. Accelerating Estimation of Perfusion Maps in Contrast X-ray Computed Tomography using Many-core CPUs and GPUs https://etd.iisc.ac.in/handle/2005/5826 Accelerating Estimation of Perfusion Maps in Contrast X-ray Computed Tomography using Many-core CPUs and GPUs Wankhede, Rahul X-ray Computed Tomography (CT) perfusion imaging is a non-invasive medical imaging modality that has been established as a fast and economical method for diagnosing cerebrovascular diseases such as acute ischemia, sub-arachnoid hemorrhage, and vasospasm. Current CT perfusion imaging being dynamic in nature, requires three-dimensional data acquisition at multiple time points, resulting in a long time for processing ranging from six to twelve minutes post acquisition. In emergency medical conditions such as stroke, every second is crucial for obtaining the perfusion maps, which are used for deploying brain-saving therapies. Since time is of the utmost importance, this thesis work attempts to develop strategies for computationally accelerating the processing of the CT perfusion data to provide perfusion maps using many-core CPUs and GPUs. Current major steps involved in perfusion maps estimation from CT perfusion data involve estimation of Arterial Input Function (AIF), followed by model-based deconvolution of AIF from tissue enhancement curves pixel-by-pixel to assess the cerebral blood flow (CBF) accurately. The deconvolution of the AIF is embarrassingly parallel and current methodologies do not account for this process to be accelerated using high performance computing environments. Specifically, this thesis utilises the multiple CPU cores that are available in current computing environments as well as General Purpose Graphics Processing Units (GP-GPUs) to provide massively parallel computing power to parallelise the deconvolution process at the pixel level. The GPUs are attractive for this application as they are built on the SIMD (Single Instruction Multiple Data) architecture. Though there are multiple ways of solving the ill-posed inverse problem of deconvolution for obtaining high-quality perfusion maps, this thesis work focuses on the Circulant Truncated-SVD based method, which was implemented using the Nvidia CUDA API that Nvidia provides for its GPUs. Further, this thesis work explores the algorithms that work for single-AIF deconvolution, which, though not very accurate, is a very good first approximation for time-critical cases to know the area of damage. These experiments were followed by the exploration of multiple-AIF deconvolution, which, although slow, is the gold standard for brain perfusion imaging. These algorithms were developed using the KBLAS library which utilizes multiple CPU and GPU cores. A detailed computational analysis through use cases reveals that GP-GPU computing is a viable option for accelerating the X-ray CT perfusion imaging and are attractive in clinic due to the footprint of these GPU machines. An Accelerator for Machine Learning Based Classifiers https://etd.iisc.ac.in/handle/2005/4245 An Accelerator for Machine Learning Based Classifiers Mohammadi, Mahnaz Artificial Neural Networks (ANNs) are algorithmic techniques that simulate biological neural systems. Typical realization of ANNs are software solutions using High Level Languages (HLLs) such as C, C++, etc. Such solutions have performance limitations which can be attributed to one of the following reasons: • Code generated by the compiler cannot perform application specific optimizations. • Communication latencies between processors through a memory hierarchy could be significant due to non-deterministic nature of the communications. In data mining _eld, ANN algorithms have been widely used as classifiers for data classification applications. Classification involves predicting a certain outcome based on a given input. In order to predict the outcome more precisely, the training algorithms should discover relationships between the attributes to make the prediction possible. So later, when an unseen pattern containing same set of attributes except for the prediction attribute (which is not known yet) is given to the algorithm it can process that pattern and produce its outcome. The prediction accuracy which defines how good the algorithm is in recognizing unseen patterns, depends on how well the algorithm is trained. Radial Basis Function Neural Network (RBFNN) is a type of neural network which has been widely used in classification applications. A pure software implementation of this network will not be able to cope with the performance expected of high-performance ANN applications. Accelerators can be used to speed-up these kinds of applications. Accelerators can take many forms. They range from especially configured cores to reconfigurable circuits. Multi-core and GPU based accelerators can speed-up these applications up to several orders of magnitude when compared to general purpose processors (GPPs). The efficiency of accelerators for RBFNN reduce as the network size increases. Custom hardware implementation is often required to exploit the parallelism and minimize computing time for real time application requirements. Neural networks have been implemented on different hardware platforms such as Application-Specific Integrated Circuits (ASICs) and Field Programmable Logic Gate Arrays (FPGAs). We provide a generic hardware solution for classification using RBFNN and Feed-forward Neural Network with backpropagation learning algorithm (FFBPNN) on a reconfigurable data path that overcomes the major drawback of _axed-function hardware data paths which offers limited edibility in terms of application interchangeability and scalability. Our contributions in this thesis are as follows: • Deification and implementation of open-source reference software implementation of a few categories of ANNs for classification purpose. • Benchmarking the performance on general processors. • Porting the source code for execution on GPU using Cuda API and benchmarking the performance. • Proposing scalable and area efficient hardware architectures for training the learning parameters of ANN. • Synthesizing the ANN on reconfigurable architectures. • MPSoC implementation of ANNs for functional verification of our implementation • Demonstration of the performance advantage of ANN realization on reconfigurable architectures over CPU and GPU for classification applications. • Proposing a generalized methodology for realization of classification using ANNs on reconfigurable architectures. Adaptive charging techniques for Li-ion battery using Reinforcement Learning https://etd.iisc.ac.in/handle/2005/5032 Adaptive charging techniques for Li-ion battery using Reinforcement Learning Tunuguntla, Surya Teja Li-ion batteries have become a promising technology in recent years and are used everywhere from low-end devices like mobile phones to high-end ones like electric vehicles. In most applications, the discharge of a battery is user-dependent unlike the charging process, which can be optimized. The research in literature is more focused on identifying the techniques that fully charges the battery with no initial charge for different optimum criteria like charging time, temperature rise, and energy loss. But those are not strictly applicable for real-life scenarios, where the charging process rarely starts with zero initial state of charge (SOC). Also, based on the specific requirements there might be time constraints for charging. Considering the above requirements, the key objective of this dissertation is to obtain a charging profile that maximizes the charge gain for any arbitrary initial SOC within a specific time-limit. To solve this objective, reinforcement learning (RL) and multi-stage-constant-current are chosen for adaptive and generalized charging of Li-ion batteries. We have used the deep deterministic policy gradient which is one of the popular RL algorithms. The algorithm is used because of its ability to perform well in continuous state-action spaces and partially observable environments. Incorporating reinforcement learning to find optimal charging profile will open up the gate for a lot of potential high-end applications of Li-ion batteries. The Li-ion battery is modeled by an equivalent RC circuit that is served as an environment for the algorithm. Initially, we have considered a naive model, where the initial SOC, and charging time-limit are fixed. The model is built using Simulink software with appropriate state, reward and action functions, and the charging profile is obtained that maximizes charge-gain. The hyperparameters are analyzed and tuned accordingly to train the agent. The obtained results are compared with the simulated standard constant-current-constant-voltage charging method, considered as the baseline model. The model is then integrated to accommodate the different initial SOCs. Two methods viz., selective and generic policy approaches are proposed to train the agent and further deployed for varying initial SOCs. From experiments, it is observed that the generic policy approach performs consistently better for different initial SOCs. The model is also trained for different charging time-limits and the performances are analyzed. The model is further extended to minimize energy loss by considering the appropriate reward function. The proposed generic model performs significantly better for low initial SOC and almost at par with the baseline model for higher initial SOCs when charged for half an hour. For lower charging time-limits, the model always performs better irrespective of initial SOC. The proposed model is also robust to the frequency of communication between the battery management system and smart charger.