• Login
    View Item 
    •   etd@IISc
    • Division of Electrical, Electronics, and Computer Science (EECS)
    • Computer Science and Automation (CSA)
    • View Item
    •   etd@IISc
    • Division of Electrical, Electronics, and Computer Science (EECS)
    • Computer Science and Automation (CSA)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Performance Characterization and Optimizations of Traditional ML Applications

    View/Open
    Thesis full text (1.507Mb)
    Author
    Kumar, Harsh
    Metadata
    Show full item record
    Abstract
    Even in the era of Deep Learning based methods, traditional machine learning methods with large data sets continue to attract significant attention. However, we find an apparent lack of a detailed performance characterization of these methods in the context of large training datasets. In this thesis, we study the systems behaviour of a number of traditional ML methods as implemented in popular free software libraries/modules to identify critical performance bottlenecks experienced by these applications. The performance characterization study reveals several interesting insights into the performance of these applications. We observe that the processor backend is the major bottleneck for our workloads, especially poor cache performance, coupled with a high fraction of CPU stall cycles due to memory latency. We also observed a very poor utilization of execution ports with only a single micro-op or no micro-op being executed for around 45% of the execution time. For the tree-based workloads, the CPU stalls due to badspeculation are also significant with values as high as 25% of CPU cycles. Then we evaluate the performance benefits of applying some well-known optimizations at the levels of caches and the main memory. More specifically, we test the usefulness of optimizations such as (i) software prefetching to improve cache performance and (ii) data layout and computation reordering optimizations to improve locality in DRAM accesses. These optimizations are implemented as modifications to the well-known scikit-learn library, which can be easily leveraged by application programmers. We evaluate the impact of the proposed optimizations using a combination of simulation and execution on a real system. The software prefetching optimization was implemented over ten workloads and it resulted in performance benefits varying from 5.2%- 27% on seven out of the ten ML applications while the data layout and computation reordering methods yielded around 8%- 23% performance improvement on seven out of eight neighbour and tree-based ML applications.
    URI
    https://etd.iisc.ac.in/handle/2005/5804
    Collections
    • Computer Science and Automation (CSA) [394]

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV
     

     

    Browse

    All of etd@IIScCommunities & CollectionsTitlesAuthorsAdvisorsSubjectsBy Thesis Submission DateThis CollectionTitlesAuthorsAdvisorsSubjectsBy Thesis Submission Date

    My Account

    LoginRegister

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV