Performance Characterization and Optimizations of Traditional ML Applications

Kumar, Harsh

dc.contributor.advisor	Govindarajan, R
dc.contributor.author	Kumar, Harsh
dc.date.accessioned	2022-07-29T04:30:03Z
dc.date.available	2022-07-29T04:30:03Z
dc.date.submitted	2022
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/5804
dc.description.abstract	Even in the era of Deep Learning based methods, traditional machine learning methods with large data sets continue to attract significant attention. However, we find an apparent lack of a detailed performance characterization of these methods in the context of large training datasets. In this thesis, we study the systems behaviour of a number of traditional ML methods as implemented in popular free software libraries/modules to identify critical performance bottlenecks experienced by these applications. The performance characterization study reveals several interesting insights into the performance of these applications. We observe that the processor backend is the major bottleneck for our workloads, especially poor cache performance, coupled with a high fraction of CPU stall cycles due to memory latency. We also observed a very poor utilization of execution ports with only a single micro-op or no micro-op being executed for around 45% of the execution time. For the tree-based workloads, the CPU stalls due to badspeculation are also significant with values as high as 25% of CPU cycles. Then we evaluate the performance benefits of applying some well-known optimizations at the levels of caches and the main memory. More specifically, we test the usefulness of optimizations such as (i) software prefetching to improve cache performance and (ii) data layout and computation reordering optimizations to improve locality in DRAM accesses. These optimizations are implemented as modifications to the well-known scikit-learn library, which can be easily leveraged by application programmers. We evaluate the impact of the proposed optimizations using a combination of simulation and execution on a real system. The software prefetching optimization was implemented over ten workloads and it resulted in performance benefits varying from 5.2%- 27% on seven out of the ten ML applications while the data layout and computation reordering methods yielded around 8%- 23% performance improvement on seven out of eight neighbour and tree-based ML applications.	en_US
dc.language.iso	en_US	en_US
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation	en_US
dc.subject	Machine Learning	en_US
dc.subject	Performance Characterization	en_US
dc.subject	Prefetching	en_US
dc.subject	Data layout and Computation Reordering	en_US
dc.subject	Deep Learning	en_US
dc.subject.classification	Research Subject Categories::TECHNOLOGY::Information technology::Computer science	en_US
dc.title	Performance Characterization and Optimizations of Traditional ML Applications	en_US
dc.type	Thesis	en_US
dc.degree.name	MTech (Res)	en_US
dc.degree.level	Masters	en_US
dc.degree.grantor	Indian Institute of Science	en_US
dc.degree.discipline	Engineering	en_US

Files in this item

Name:: The_IISc_Thesis_Template.pdf
Size:: 1.507Mb
Format:: PDF
Description:: Thesis full text

View/Open

This item appears in the following Collection(s)

Computer Science and Automation (CSA) [376]

Show simple item record