Cooperative Execution of Opencl Programs on Multiple Heterogeneous Devices

Pandit, Prasanna Vasant

dc.contributor.advisor	Govindarajan, R
dc.contributor.author	Pandit, Prasanna Vasant
dc.date.accessioned	2018-05-01T06:49:24Z
dc.date.accessioned	2018-07-31T05:09:20Z
dc.date.available	2018-05-01T06:49:24Z
dc.date.available	2018-07-31T05:09:20Z
dc.date.issued	2018-05-01
dc.date.submitted	2013
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/3468
dc.identifier.abstract	http://etd.iisc.ac.in/static/etd/abstracts/4335/G25888-Abs.pdf	en_US
dc.description.abstract	Computing systems have become heterogeneous with the increasing prevalence of multi-core CPUs, Graphics Processing Units (GPU) and other accelerators in them. OpenCL has emerged as an attractive programming framework for heterogeneous systems. However, utilizing mul- tiple devices in OpenCL is a challenge as it requires the programmer to explicitly map data and computation to each device. Utilizing multiple devices simultaneously to speed up execu- tion of a kernel is even more complex, as the relative execution time of the kernel on different devices can vary signiﬁcantly. Also, after each kernel execution, a coherent version of the data needs to be established. This means that, in order to utilize all devices effectively, the programmer has to spend considerable time and effort to distribute work across all devices, keep track of modiﬁed data in these devices and correctly perform a merging step to put the data together. Further, the relative performance of a program may vary across different inputs, which means a statically determined work distribution may not work well. In this work, we present FluidiCL, an OpenCL runtime that takes a program written for a single device and uses multiple heterogeneous devices to execute each kernel. The runtime performs dynamic work distribution and cooperatively executes each kernel on all available devices. Since we consider a setup with devices having discrete address spaces, our solution ensures that execution of OpenCL work-groups on devices is adjusted by taking into account the overheads for data management. The data transfers and data merging needed to ensure coherence are handled transparently without requiring any effort from the programmer. Flu- idiCL also does not require prior training or proﬁling and is completely portable across dif- ferent machines. Because it is dynamic, the runtime is able to adapt to system load. We have developed several optimizations for improving the performance of FluidiCL. We evaluate the runtime across different sets of devices. On a machine with an Intel quad-core processor and an NVidia Fermi GPU, FluidiCL shows a geomean speedup of nearly 64% over the GPU, 88% over the CPU and 14% over the best of the two devices in each benchmark. In all benchmarks, performance of our runtime comes to within 13% of the best of the two devices. FluidiCL shows similar results on a machine with a quad-core CPU and an NVidia Kepler GPU, with up to 26% speedup over the best of the two. We also present results considering an Intel Xeon Phi accelerator and a CPU and ﬁnd that FluidiCL performs up to 45% faster than the best of the two devices. We extend FluidiCL from a CPU–GPU scenario to a three-device setup hav- ing a quad-core CPU, an NVidia Kepler GPU and an Intel Xeon Phi accelerator and ﬁnd that FluidiCL obtains a geomean improvement of 6% in kernel execution time over the best of the three devices considered in each case.	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	G25888	en_US
dc.subject	Heterogeneous Computers	en_US
dc.subject	Open Computing Language	en_US
dc.subject	FluidiCL	en_US
dc.subject	Fluidic Kernels	en_US
dc.subject	OpenCL Application Programming Interface	en_US
dc.subject	Graphics Processing Unit (GPU)	en_US
dc.subject	Central Processing Unit (CPU)	en_US
dc.subject	Computer Architecture	en_US
dc.subject	FluidiCL Runtime	en_US
dc.subject	Heterogeneous OpenCL Runtime	en_US
dc.subject	OpenCL Programs	en_US
dc.subject	CPU–GPU Systems	en_US
dc.subject.classification	Computer Engineering	en_US
dc.title	Cooperative Execution of Opencl Programs on Multiple Heterogeneous Devices	en_US
dc.type	Thesis	en_US
dc.degree.name	MSc Engg	en_US
dc.degree.level	Masters	en_US
dc.degree.discipline	Faculty of Engineering	en_US

Files in this item

Name:: G25888.pdf
Size:: 645.6Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Supercomputer Education and Research Centre (SERC) [98]

Show simple item record