Browsing by Advisor "Govindarajan, R"
Now showing items 1-20 of 25
-
2-Level Page Tables (2-LPT): A Building Block for Efficient Address Translation in Virtualized Environments
Efficient address translation mechanisms are gaining more and more attention as the virtual address range of the processors keeps expanding and the demand for machine virtualization increases with cloud and data center-based ... -
Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterogeneous Processors
(2014-05-19)MATLAB is an array language, initially popular for rapid prototyping, but is now being in-creasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data ... -
Compiler Transformations For Improving The Performance Of Software Transactional Memory
(2013-03-27)Expressing synchronization using traditional lock based primitives has been found to be both error-prone and restrictive. Hence there has been considerable research work to develop scalable and programmer-friendly alternatives ... -
Comprehensive Path-sensitive Data-flow Analysis
(2010-08-25)Data-flow analysis is an integral part of any aggressive optimizing compiler. We propose a framework for improving the precision of data-flow analysis in the presence of complex control-flow. We initially perform data-flow ... -
Cooperative Execution of Opencl Programs on Multiple Heterogeneous Devices
(2018-05-01)Computing systems have become heterogeneous with the increasing prevalence of multi-core CPUs, Graphics Processing Units (GPU) and other accelerators in them. OpenCL has emerged as an attractive programming framework for ... -
Efficient Cache Organization For Application Specific And General Purpose Processors
(2010-08-25)The performance gap between processor and memory continues to remain a major performance bottleneck in both application specific and general purpose processors. This thesis strives to ease the above bottleneck by exploiting ... -
Efficient Compilation Of Stream Programs Onto Multi-cores With Accelerators
(2010-12-30)Over the past two decades, microprocessor manufacturers have typically relied on wider issue widths and deeper pipelines to obtain performance improvements for single threaded applications. However, in the recent years, ... -
Efficient Dynamic Automatic Memory Management And Concurrent Kernel Execution For General-Purpose Programs On Graphics Processing Units
(2017-04-28)Modern supercomputers now use accelerators to achieve their performance with the most widely used accelerator being the Graphics Processing Unit (GPU). However, achieving the performance potential of systems that combine ... -
Efficient Resource Usage Modelling
(2011-11-16) -
Efficient Techniques Exploiting Memory Hierarchy to Improve Network Processor Performance
The performance of network processors depends on the architecture of the chip, the network processing application and the workload characteristics. In this thesis, we model the memory hierarchy of a network processor and ... -
Enhancing GPGPU Performance through Warp Scheduling, Divergence Taming and Runtime Parallelizing Transformations
(2018-08-29)There has been a tremendous growth in the use of Graphics Processing Units (GPU) for the acceleration of general purpose applications. The growth is primarily due to the huge computing power offered by the GPUs and the ... -
Heterogeneity Aware Shared DRAM Cache for Integrated Heterogeneous Architectures
Integrated Heterogeneous System (IHS) processors pack throughput-oriented GPGPUs along-side latency-oriented CPUs on the same die sharing certain resources, e.g., shared last level cache, network-on-chip (NoC), and the ... -
Improving Last-Level Cache Performance in Single and Multi-Core Processsors
(2018-04-23)With off-chip memory access taking 100's of processor cycles, getting data to the processor in a timely fashion remains one of the key performance bottlenecks in current systems. With increasing core counts, this problem ... -
Loop Transformations for Multi-/Many-Core Architectures using Machine Learning
Loop transformation techniques such as loop tiling, loop interchange and unroll-and-jam help expose better coarse-grain and fine-grain data-level parallelisms as well as exploit data locality. These transformations are ... -
On Leveraging Dynamic Processes in Large Social Networks for Smart Cities
The concept of smart city which began as being synonymous with electronically networked community underwent significant changes with the growth in mobile devices and social networking. This expanded the outlook of smart ... -
On-Chip Memory Architecture Exploration Of Embedded System On Chip
(2010-07-14)Today’s feature-rich multimedia products require embedded system solution with complex System-on-Chip (SoC) to meet market expectations of high performance at low cost and lower energy consumption. SoCs are complex designs ... -
Performance analysis of methods that overcome false sharing effects in software DSMs
Software Distributed Shared Memory (DSM) systems, which rely on virtual memory mechanisms to detect accesses to shared locations and maintain their consistency, support a sharing granularity of a page size, which is of the ... -
Performance Characterization and Optimizations of Traditional ML Applications
Even in the era of Deep Learning based methods, traditional machine learning methods with large data sets continue to attract significant attention. However, we find an apparent lack of a detailed performance characterization ...

