Show simple item record

dc.contributor.advisorGovindarajan, R
dc.contributor.authorKumar, Harsh
dc.date.accessioned2022-07-29T04:30:03Z
dc.date.available2022-07-29T04:30:03Z
dc.date.submitted2022
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/5804
dc.description.abstractEven in the era of Deep Learning based methods, traditional machine learning methods with large data sets continue to attract significant attention. However, we find an apparent lack of a detailed performance characterization of these methods in the context of large training datasets. In this thesis, we study the systems behaviour of a number of traditional ML methods as implemented in popular free software libraries/modules to identify critical performance bottlenecks experienced by these applications. The performance characterization study reveals several interesting insights into the performance of these applications. We observe that the processor backend is the major bottleneck for our workloads, especially poor cache performance, coupled with a high fraction of CPU stall cycles due to memory latency. We also observed a very poor utilization of execution ports with only a single micro-op or no micro-op being executed for around 45% of the execution time. For the tree-based workloads, the CPU stalls due to badspeculation are also significant with values as high as 25% of CPU cycles. Then we evaluate the performance benefits of applying some well-known optimizations at the levels of caches and the main memory. More specifically, we test the usefulness of optimizations such as (i) software prefetching to improve cache performance and (ii) data layout and computation reordering optimizations to improve locality in DRAM accesses. These optimizations are implemented as modifications to the well-known scikit-learn library, which can be easily leveraged by application programmers. We evaluate the impact of the proposed optimizations using a combination of simulation and execution on a real system. The software prefetching optimization was implemented over ten workloads and it resulted in performance benefits varying from 5.2%- 27% on seven out of the ten ML applications while the data layout and computation reordering methods yielded around 8%- 23% performance improvement on seven out of eight neighbour and tree-based ML applications.en_US
dc.language.isoen_USen_US
dc.rightsI grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertationen_US
dc.subjectMachine Learningen_US
dc.subjectPerformance Characterizationen_US
dc.subjectPrefetchingen_US
dc.subjectData layout and Computation Reorderingen_US
dc.subjectDeep Learningen_US
dc.subject.classificationResearch Subject Categories::TECHNOLOGY::Information technology::Computer scienceen_US
dc.titlePerformance Characterization and Optimizations of Traditional ML Applicationsen_US
dc.typeThesisen_US
dc.degree.nameMTech (Res)en_US
dc.degree.levelMastersen_US
dc.degree.grantorIndian Institute of Scienceen_US
dc.degree.disciplineEngineeringen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record