<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
<channel>
<title>Electronic Systems Engineering (ESE)</title>
<link>https://etd.iisc.ac.in/handle/2005/20</link>
<description/>
<pubDate>Wed, 15 Apr 2026 22:59:13 GMT</pubDate>
<dc:date>2026-04-15T22:59:13Z</dc:date>
<image>
<title>Electronic Systems Engineering (ESE)</title>
<url>http://etd.iisc.ac.in:80/bitstream/id/d519318a-cdb6-4a05-8f42-85138d206113/</url>
<link>https://etd.iisc.ac.in/handle/2005/20</link>
</image>
<item>
<title>Addressing Energy and Performance Related Challenges in Networked Embedded Systems</title>
<link>https://etd.iisc.ac.in/handle/2005/5229</link>
<description>Addressing Energy and Performance Related Challenges in Networked Embedded Systems
Singh, Kaumudi
Networked Embedded Systems comprise of spatially and functionally distributed nodes that&#13;
are interconnected with one another and with the environment to achieve certain goals. The&#13;
nodes are connected to one another through wired or wireless communication technologies. This&#13;
thesis focuses on wireless networks and the challenges encountered in such systems. A node in a&#13;
wireless network is usually energy-constrained, irrespective of its power source. Hence, schemes&#13;
that judiciously utilize the energy available at a node to power its peripherals and execute&#13;
various operations without any performance degradation are required. We have devised energy&#13;
efficient schemes for data management, accessing network resources, and rendering location based&#13;
services in a network.&#13;
A sensor node in a network samples a parameter of interest, followed by transmitting the&#13;
samples to an aggregator. We have devised an adaptive sampling algorithm that adapts the&#13;
rate and resolution at which the parameter is sampled based on available energy and its characteristics.&#13;
Furthermore, we have devised energy and data value-aware algorithms that encourage&#13;
the selective transmission of data such that  fidelity of data recovery is not adversely affected.&#13;
These schemes not only improve energy utilization but also reduce traffic generated by a node&#13;
in the network.&#13;
Before data can be transmitted, nodes are required to perform handshakes on the control&#13;
channel so that they can access resources for data transmission. The energy consumed while&#13;
performing these handshakes is often not examined as most of the handshakes are performed&#13;
only a limited number of times. However, delays in these handshakes affect the ensuing data&#13;
transmission. To this effect, we have proposed a \Device Registration" algorithm that provides&#13;
quick access to the Contention Free Period (CFP) resources in the beacon-enabled mode of&#13;
IEEE 802.15.4 technology. The algorithm can be implemented with minor modi cations to the&#13;
parameters of the standard and allows the nodes to transmit their data promptly. We have&#13;
also studied IEEE 802.15.4e-TSCH technology and proposed a \Sparse Beacon Advertisement"&#13;
algorithm for beacon scheduling so that nodes can join a network in quick time, even when very&#13;
few beacons are being advertised in the network. Both these schemes not only promote fast&#13;
access to network resources but also reduce the energy consumed by nodes in accessing these&#13;
resources.&#13;
Finally, we have studied the performance of location-based services when applied to asset&#13;
localization in a space-constrained environment. Radio Frequency Identi cation (RFID) technology&#13;
has been studied for localization due to its batteryless operation. We have constructed&#13;
two different reader-antenna setups for tag interrogation and have employed these setups to&#13;
track and localize assets in different scenarios. We have studied the effect of tag orientation&#13;
and placements on the measurements collected from the tags and have utilized the fi ndings to&#13;
track fi rst responders in a corridor. We have also devised methods to localize the tags with&#13;
sufficient accuracy in scenarios where we collect sparse tag data. We observed that the accuracy&#13;
of localization depends signi ficantly on the quality as well as the quantity of tag reads. Next,&#13;
we have addressed the localization of life safety vests, which are equipped with RFID tags, in&#13;
an aircraft and have devised mechanisms to obtain accurate 2D location information of all the&#13;
vests present in the aircraft.
</description>
<guid isPermaLink="false">https://etd.iisc.ac.in/handle/2005/5229</guid>
</item>
<item>
<title>Addressing the Performance and Reliability Bottlenecks in 2D Transition Metal Dichalcogenide (TMD) Based Transistor Technology</title>
<link>https://etd.iisc.ac.in/handle/2005/5716</link>
<description>Addressing the Performance and Reliability Bottlenecks in 2D Transition Metal Dichalcogenide (TMD) Based Transistor Technology
Kuruva, Hemanjaneyulu
In this thesis, we presented different contributions towards the development of&#13;
2D material technology. Firstly the realization of desired dimensions over singlecrystal&#13;
high-quality MoS2 material through dry etching techniques. SF6 plasma induces&#13;
large residue over the material, inhibiting the application despite its advantage&#13;
over SiO2 etch selectivity. On the other hand, CHF3 plasma is shown to give a&#13;
well-controlled etching process with its relatively lower etch rate than SF6 plasma.&#13;
However, under over-etch conditions, plasma is observed to introduce two significant&#13;
challenges. The first is the doping induced by high-energy fluorine radicals diffused&#13;
through resist and the TMD material. The second one is the crystal damage caused&#13;
by plasma from the side walls elimination of these two challenges required highly&#13;
controlled etching. Optimized and controlled etching using CHF3 plasma resulted&#13;
in transistors’ fabrication without compromising the performance compared to reference&#13;
transistors. The same controlled etching process is observed to apply to other&#13;
TMDs as well. Transistors implemented with such an approach have shown no degradation&#13;
in performance metrics than standard devices, thus generalizing the process&#13;
applicability to all TMDs.
</description>
<guid isPermaLink="false">https://etd.iisc.ac.in/handle/2005/5716</guid>
</item>
<item>
<title>Algorithm And Architecture Design for Real-time Face Recognition</title>
<link>https://etd.iisc.ac.in/handle/2005/2743</link>
<description>Algorithm And Architecture Design for Real-time Face Recognition
Mahale, Gopinath Vasanth
Face recognition is a field of biometrics that deals with identification of subjects based on features present in the images of their faces. The factors that make face recognition popular and favorite as compared to other biometric methods are easier operation and ability to identify subjects without their knowledge. With these features, face recognition has become an integral part of the present day security systems, targeting a smart and secure world. 
There are various factors that de ne the performance of a face recognition system. The most important among them are recognition accuracy of algorithm used and time taken for recognition. Recognition accuracy of the face recognition algorithm gets affected by changes in pose, facial expression and illumination along with occlusions in the images. There have been a number of algorithms proposed to enable recognition under these ambient changes. However, it has been hard to and a single algorithm that can efficiently recognize faces in all the above mentioned conditions. Moreover, achieving real time performance for most of the complex face recognition algorithms on embedded platforms has been a challenge. Real-time performance is highly preferred in critical applications such as identification of crime suspects in public. As available software solutions for FR have significantly large latency in recognizing individuals, they are not suitable for such critical real-time applications. This thesis focuses on real-time aspect of FR, where acceleration of the algorithms is achieved by means of parallel hardware architectures. 
The major contributions of this work are as follows. We target to design a face recognition system that can identify at most 30 faces in each frame of video at 15 frames per second, which amounts to 450 recognitions per second. In addition, we target to achieve good recognition accuracy along with scalability in terms of database size and input image resolutions. To design a system with these specifications, as a first step, we explore algorithms in literature and come up with a hybrid face recognition algorithm. This hybrid algorithm shows good recognition accuracy on face images with changes in illumination, pose and expressions, and also with occlusions. In addition the computations in the algorithm are modular in nature which are suitable for real-time realizations through parallel processing.
The face recognition system consists of a face detection module to detect faces in the input image, which is followed by a face recognition module to identify the detected faces. There are well established algorithms and architectures for face detection in literature which can perform detection at 15 frames per second on video frames. Detected faces of different sizes need to be scaled to the size specified by the face recognition module. To meet the real-time constraints, we propose a hardware architecture for real-time bi-cubic convolution interpolation with dynamic scaling factors. To recognize the resized faces in real-time, a scalable parallel pipelined architecture is designed for the hybrid algorithm which can perform 450 recognitions per second on a database containing grayscale images of at most 450 classes on Virtex 6 FPGA. To provide flexibility and programmability, we extend this design to REDEFINE, a multi-core massively parallel reconfigurable architecture. In this design, we come up with FR specific programmable cores termed Scalable Unit for Region Evaluation (SURE) capable of performing modular computations in the hybrid face recognition algorithm. We replicate SUREs in each tile of REDEFINE to construct a face recognition module termed REDEFINE for Face Recognition using SURE Homogeneous Cores (REFRESH). 
There is a need to learn new unseen faces on-line in practical face recognition systems. Considering this, for real-time on-line learning of unseen face images, we design tiny processors termed VOP, Processor for Vector Operations. VOPs function as coprocessors to process elements under each tile of REDEFINE to accelerate micro vector operations appearing in the synaptic weight computations. We also explore deep neural networks which operate similar to the processing in human brain and capable of working on very large face databases. We explore the field of Random matrix theory to come up with a solution for synaptic weight initialization in deep neural networks for better classification . In addition, we perform design space exploration of hardware architecture for deep convolution networks and conclude with directions for future work.
</description>
<pubDate>Tue, 31 Oct 2017 00:00:00 GMT</pubDate>
<guid isPermaLink="false">https://etd.iisc.ac.in/handle/2005/2743</guid>
<dc:date>2017-10-31T00:00:00Z</dc:date>
</item>
<item>
<title>Algorithm-Architecture Co-Design for Dense Linear Algebra Computations</title>
<link>https://etd.iisc.ac.in/handle/2005/3958</link>
<description>Algorithm-Architecture Co-Design for Dense Linear Algebra Computations
Merchant, Farhad
Achieving high computation efficiency, in terms of Cycles per Instruction (CPI), for high-performance computing kernels is an interesting and challenging research area. Dense Linear Algebra (DLA) computation is a representative high-performance computing ap-
plication, which is used, for example, in LU and QR factorizations. Unfortunately, mod-
ern off-the-shelf microprocessors fall significantly short of achieving theoretical lower bound in CPI for high performance computing applications. In this thesis, we perform an in-depth analysis of the available parallelisms and propose suitable algorithmic
and architectural variation to significantly improve the computation efficiency. There
are two standard approaches for improving the computation effficiency, first, to perform
application-specific architecture customization and second, to do algorithmic tuning.
In the same manner, we first perform a graph-based analysis of selected DLA kernels.
From the various forms of parallelism, thus identified, we design a custom processing
element for improving the CPI. The processing elements are used as building blocks for
a commercially available Coarse-Grained Reconfigurable Architecture (CGRA). By per-
forming detailed experiments on a synthesized CGRA implementation, we demonstrate
that our proposed algorithmic and architectural variations are able to achieve lower CPI compared to off-the-shelf microprocessors. We also benchmark against state-of-the-art custom implementations to report higher energy-performance-area product.
DLA computations are encountered in many engineering and scientific computing ap-
plications ranging from Computational Fluid Dynamics (CFD) to Eigenvalue problem.
Traditionally, these applications are written in highly tuned High Performance Comput-
ing (HPC) software packages like Linear Algebra Package (LAPACK), and/or Scalable 
Linear Algebra Package (ScaLAPACK). The basic building block for these packages is Ba-
sic Linear Algebra Subprograms (BLAS). Algorithms pertaining LAPACK/ScaLAPACK
are written in-terms of BLAS to achieve high throughput. Despite extensive intellectual
efforts in development and tuning of these packages, there still exists a scope for fur-
ther tuning in this packages. In this thesis, we revisit most prominent and widely used
compute bound algorithms like GMM for further exploitation of Instruction Level Parallelism (ILP). We further look into LU and QR factorizations for generalizations and
exhibit higher ILP in these algorithms. We first accelerate sequential performance of the algorithms in BLAS and LAPACK and then focus on the parallel realization of these
algorithms. Major contributions in the algorithmic tuning in this thesis are as follows:
Algorithms:
  We present graph based analysis of General Matrix Multiplication (GMM) and
discuss different types of parallelisms available in GMM
  We present analysis of Givens Rotation based QR factorization where we improve
GR and derive Column-wise GR (CGR) that can annihilate multiple elements of a
column of a matrix simultaneously. We show that the multiplications in CGR are
lower than GR
  We generalize CGR further and derive Generalized GR (GGR) that can annihilate
multiple elements of the columns of a matrix simultaneously. We show that the
parallelism exhibited by GGR is much higher than GR and Householder Transform
(HT)
  We extend generalizations to Square root Free GR (also knows as Fast Givens
Rotation) and Square root and Division Free GR (SDFG) and derive Column-wise
Fast Givens, and Column-wise SDFG . We also extend generalization for complex
matrices and derive Complex Column-wise Givens Rotation
Coarse-grained Recon gurable Architectures (CGRAs) have gained popularity in the
last decade due to their power and area efficiency. Furthermore, CGRAs like REDEFINE also exhibit support for domain customizations. REDEFINE is an array of Tiles where each Tile consists of a Compute Element and a Router. The Routers are responsible
for on-chip communication, while Compute Elements in the REDEFINE can be domain
customized to accelerate the applications pertaining to the domain of interest. In this
thesis, we consider REDEFINE base architecture as a starting point and we design Processing Element (PE) that can execute algorithms in BLAS and LAPACK efficiently.
We perform several architectural enhancements in the PE to approach lower bound of the
CPI. For parallel realization of BLAS and LAPACK, we attach this PE to the Router of
REDEFINE. We achieve better area and power performance compared to the yesteryear
customized architecture for DLA. Major contributions in architecture in this thesis are as follows:
Architecture:
  We present design of a PE for acceleration of GMM which is a Level-3 BLAS
operation
  We methodically enhance the PE with different features for improvement in the
performance of GMM
  For efficient realization of Linear Algebra Package (LAPACK), we use PE that can
efficiently execute GMM and show better performance
  For further acceleration of LU and QR factorizations in LAPACK, we identify
macro operations encountered in LU and QR factorizations, and realize them on a
reconfigurable data-path resulting in 25-30% lower run-time
</description>
<pubDate>Mon, 13 Aug 2018 00:00:00 GMT</pubDate>
<guid isPermaLink="false">https://etd.iisc.ac.in/handle/2005/3958</guid>
<dc:date>2018-08-13T00:00:00Z</dc:date>
</item>
</channel>
</rss>
