Electronic Systems Engineering (ESE)

Electronic Systems Engineering (ESE) https://etd.iisc.ac.in/handle/2005/20 Wed, 15 Apr 2026 22:59:13 GMT 2026-04-15T22:59:13Z Electronic Systems Engineering (ESE) http://etd.iisc.ac.in:80/bitstream/id/d519318a-cdb6-4a05-8f42-85138d206113/ https://etd.iisc.ac.in/handle/2005/20 Addressing Energy and Performance Related Challenges in Networked Embedded Systems https://etd.iisc.ac.in/handle/2005/5229 Addressing Energy and Performance Related Challenges in Networked Embedded Systems Singh, Kaumudi Networked Embedded Systems comprise of spatially and functionally distributed nodes that are interconnected with one another and with the environment to achieve certain goals. The nodes are connected to one another through wired or wireless communication technologies. This thesis focuses on wireless networks and the challenges encountered in such systems. A node in a wireless network is usually energy-constrained, irrespective of its power source. Hence, schemes that judiciously utilize the energy available at a node to power its peripherals and execute various operations without any performance degradation are required. We have devised energy efficient schemes for data management, accessing network resources, and rendering location based services in a network. A sensor node in a network samples a parameter of interest, followed by transmitting the samples to an aggregator. We have devised an adaptive sampling algorithm that adapts the rate and resolution at which the parameter is sampled based on available energy and its characteristics. Furthermore, we have devised energy and data value-aware algorithms that encourage the selective transmission of data such that fidelity of data recovery is not adversely affected. These schemes not only improve energy utilization but also reduce traffic generated by a node in the network. Before data can be transmitted, nodes are required to perform handshakes on the control channel so that they can access resources for data transmission. The energy consumed while performing these handshakes is often not examined as most of the handshakes are performed only a limited number of times. However, delays in these handshakes affect the ensuing data transmission. To this effect, we have proposed a \Device Registration" algorithm that provides quick access to the Contention Free Period (CFP) resources in the beacon-enabled mode of IEEE 802.15.4 technology. The algorithm can be implemented with minor modi cations to the parameters of the standard and allows the nodes to transmit their data promptly. We have also studied IEEE 802.15.4e-TSCH technology and proposed a \Sparse Beacon Advertisement" algorithm for beacon scheduling so that nodes can join a network in quick time, even when very few beacons are being advertised in the network. Both these schemes not only promote fast access to network resources but also reduce the energy consumed by nodes in accessing these resources. Finally, we have studied the performance of location-based services when applied to asset localization in a space-constrained environment. Radio Frequency Identi cation (RFID) technology has been studied for localization due to its batteryless operation. We have constructed two different reader-antenna setups for tag interrogation and have employed these setups to track and localize assets in different scenarios. We have studied the effect of tag orientation and placements on the measurements collected from the tags and have utilized the fi ndings to track fi rst responders in a corridor. We have also devised methods to localize the tags with sufficient accuracy in scenarios where we collect sparse tag data. We observed that the accuracy of localization depends signi ficantly on the quality as well as the quantity of tag reads. Next, we have addressed the localization of life safety vests, which are equipped with RFID tags, in an aircraft and have devised mechanisms to obtain accurate 2D location information of all the vests present in the aircraft. https://etd.iisc.ac.in/handle/2005/5229 Addressing the Performance and Reliability Bottlenecks in 2D Transition Metal Dichalcogenide (TMD) Based Transistor Technology https://etd.iisc.ac.in/handle/2005/5716 Addressing the Performance and Reliability Bottlenecks in 2D Transition Metal Dichalcogenide (TMD) Based Transistor Technology Kuruva, Hemanjaneyulu In this thesis, we presented different contributions towards the development of 2D material technology. Firstly the realization of desired dimensions over singlecrystal high-quality MoS2 material through dry etching techniques. SF6 plasma induces large residue over the material, inhibiting the application despite its advantage over SiO2 etch selectivity. On the other hand, CHF3 plasma is shown to give a well-controlled etching process with its relatively lower etch rate than SF6 plasma. However, under over-etch conditions, plasma is observed to introduce two significant challenges. The first is the doping induced by high-energy fluorine radicals diffused through resist and the TMD material. The second one is the crystal damage caused by plasma from the side walls elimination of these two challenges required highly controlled etching. Optimized and controlled etching using CHF3 plasma resulted in transistors’ fabrication without compromising the performance compared to reference transistors. The same controlled etching process is observed to apply to other TMDs as well. Transistors implemented with such an approach have shown no degradation in performance metrics than standard devices, thus generalizing the process applicability to all TMDs. https://etd.iisc.ac.in/handle/2005/5716 Algorithm And Architecture Design for Real-time Face Recognition https://etd.iisc.ac.in/handle/2005/2743 Algorithm And Architecture Design for Real-time Face Recognition Mahale, Gopinath Vasanth Face recognition is a field of biometrics that deals with identification of subjects based on features present in the images of their faces. The factors that make face recognition popular and favorite as compared to other biometric methods are easier operation and ability to identify subjects without their knowledge. With these features, face recognition has become an integral part of the present day security systems, targeting a smart and secure world. There are various factors that de ne the performance of a face recognition system. The most important among them are recognition accuracy of algorithm used and time taken for recognition. Recognition accuracy of the face recognition algorithm gets affected by changes in pose, facial expression and illumination along with occlusions in the images. There have been a number of algorithms proposed to enable recognition under these ambient changes. However, it has been hard to and a single algorithm that can efficiently recognize faces in all the above mentioned conditions. Moreover, achieving real time performance for most of the complex face recognition algorithms on embedded platforms has been a challenge. Real-time performance is highly preferred in critical applications such as identification of crime suspects in public. As available software solutions for FR have significantly large latency in recognizing individuals, they are not suitable for such critical real-time applications. This thesis focuses on real-time aspect of FR, where acceleration of the algorithms is achieved by means of parallel hardware architectures. The major contributions of this work are as follows. We target to design a face recognition system that can identify at most 30 faces in each frame of video at 15 frames per second, which amounts to 450 recognitions per second. In addition, we target to achieve good recognition accuracy along with scalability in terms of database size and input image resolutions. To design a system with these specifications, as a first step, we explore algorithms in literature and come up with a hybrid face recognition algorithm. This hybrid algorithm shows good recognition accuracy on face images with changes in illumination, pose and expressions, and also with occlusions. In addition the computations in the algorithm are modular in nature which are suitable for real-time realizations through parallel processing. The face recognition system consists of a face detection module to detect faces in the input image, which is followed by a face recognition module to identify the detected faces. There are well established algorithms and architectures for face detection in literature which can perform detection at 15 frames per second on video frames. Detected faces of different sizes need to be scaled to the size specified by the face recognition module. To meet the real-time constraints, we propose a hardware architecture for real-time bi-cubic convolution interpolation with dynamic scaling factors. To recognize the resized faces in real-time, a scalable parallel pipelined architecture is designed for the hybrid algorithm which can perform 450 recognitions per second on a database containing grayscale images of at most 450 classes on Virtex 6 FPGA. To provide flexibility and programmability, we extend this design to REDEFINE, a multi-core massively parallel reconfigurable architecture. In this design, we come up with FR specific programmable cores termed Scalable Unit for Region Evaluation (SURE) capable of performing modular computations in the hybrid face recognition algorithm. We replicate SUREs in each tile of REDEFINE to construct a face recognition module termed REDEFINE for Face Recognition using SURE Homogeneous Cores (REFRESH). There is a need to learn new unseen faces on-line in practical face recognition systems. Considering this, for real-time on-line learning of unseen face images, we design tiny processors termed VOP, Processor for Vector Operations. VOPs function as coprocessors to process elements under each tile of REDEFINE to accelerate micro vector operations appearing in the synaptic weight computations. We also explore deep neural networks which operate similar to the processing in human brain and capable of working on very large face databases. We explore the field of Random matrix theory to come up with a solution for synaptic weight initialization in deep neural networks for better classification . In addition, we perform design space exploration of hardware architecture for deep convolution networks and conclude with directions for future work. Tue, 31 Oct 2017 00:00:00 GMT https://etd.iisc.ac.in/handle/2005/2743 2017-10-31T00:00:00Z Algorithm-Architecture Co-Design for Dense Linear Algebra Computations https://etd.iisc.ac.in/handle/2005/3958 Algorithm-Architecture Co-Design for Dense Linear Algebra Computations Merchant, Farhad Achieving high computation efficiency, in terms of Cycles per Instruction (CPI), for high-performance computing kernels is an interesting and challenging research area. Dense Linear Algebra (DLA) computation is a representative high-performance computing ap- plication, which is used, for example, in LU and QR factorizations. Unfortunately, mod- ern off-the-shelf microprocessors fall significantly short of achieving theoretical lower bound in CPI for high performance computing applications. In this thesis, we perform an in-depth analysis of the available parallelisms and propose suitable algorithmic and architectural variation to significantly improve the computation efficiency. There are two standard approaches for improving the computation effficiency, first, to perform application-specific architecture customization and second, to do algorithmic tuning. In the same manner, we first perform a graph-based analysis of selected DLA kernels. From the various forms of parallelism, thus identified, we design a custom processing element for improving the CPI. The processing elements are used as building blocks for a commercially available Coarse-Grained Reconfigurable Architecture (CGRA). By per- forming detailed experiments on a synthesized CGRA implementation, we demonstrate that our proposed algorithmic and architectural variations are able to achieve lower CPI compared to off-the-shelf microprocessors. We also benchmark against state-of-the-art custom implementations to report higher energy-performance-area product. DLA computations are encountered in many engineering and scientific computing ap- plications ranging from Computational Fluid Dynamics (CFD) to Eigenvalue problem. Traditionally, these applications are written in highly tuned High Performance Comput- ing (HPC) software packages like Linear Algebra Package (LAPACK), and/or Scalable Linear Algebra Package (ScaLAPACK). The basic building block for these packages is Ba- sic Linear Algebra Subprograms (BLAS). Algorithms pertaining LAPACK/ScaLAPACK are written in-terms of BLAS to achieve high throughput. Despite extensive intellectual efforts in development and tuning of these packages, there still exists a scope for fur- ther tuning in this packages. In this thesis, we revisit most prominent and widely used compute bound algorithms like GMM for further exploitation of Instruction Level Parallelism (ILP). We further look into LU and QR factorizations for generalizations and exhibit higher ILP in these algorithms. We first accelerate sequential performance of the algorithms in BLAS and LAPACK and then focus on the parallel realization of these algorithms. Major contributions in the algorithmic tuning in this thesis are as follows: Algorithms: We present graph based analysis of General Matrix Multiplication (GMM) and discuss different types of parallelisms available in GMM We present analysis of Givens Rotation based QR factorization where we improve GR and derive Column-wise GR (CGR) that can annihilate multiple elements of a column of a matrix simultaneously. We show that the multiplications in CGR are lower than GR We generalize CGR further and derive Generalized GR (GGR) that can annihilate multiple elements of the columns of a matrix simultaneously. We show that the parallelism exhibited by GGR is much higher than GR and Householder Transform (HT) We extend generalizations to Square root Free GR (also knows as Fast Givens Rotation) and Square root and Division Free GR (SDFG) and derive Column-wise Fast Givens, and Column-wise SDFG . We also extend generalization for complex matrices and derive Complex Column-wise Givens Rotation Coarse-grained Recon gurable Architectures (CGRAs) have gained popularity in the last decade due to their power and area efficiency. Furthermore, CGRAs like REDEFINE also exhibit support for domain customizations. REDEFINE is an array of Tiles where each Tile consists of a Compute Element and a Router. The Routers are responsible for on-chip communication, while Compute Elements in the REDEFINE can be domain customized to accelerate the applications pertaining to the domain of interest. In this thesis, we consider REDEFINE base architecture as a starting point and we design Processing Element (PE) that can execute algorithms in BLAS and LAPACK efficiently. We perform several architectural enhancements in the PE to approach lower bound of the CPI. For parallel realization of BLAS and LAPACK, we attach this PE to the Router of REDEFINE. We achieve better area and power performance compared to the yesteryear customized architecture for DLA. Major contributions in architecture in this thesis are as follows: Architecture: We present design of a PE for acceleration of GMM which is a Level-3 BLAS operation We methodically enhance the PE with different features for improvement in the performance of GMM For efficient realization of Linear Algebra Package (LAPACK), we use PE that can efficiently execute GMM and show better performance For further acceleration of LU and QR factorizations in LAPACK, we identify macro operations encountered in LU and QR factorizations, and realize them on a reconfigurable data-path resulting in 25-30% lower run-time Mon, 13 Aug 2018 00:00:00 GMT https://etd.iisc.ac.in/handle/2005/3958 2018-08-13T00:00:00Z