A COMPILER FRAMEWORK FOR OPTIMIZING DYNAMIC PARALLELISM ON GPUS

dc.contributor.advisorEl Hajj, Izzat
dc.contributor.authorOlabi, Mhd Ghaith
dc.contributor.departmentComputer Science
dc.contributor.facultyFaculty of Arts and Sciences
dc.date2021
dc.date.accessioned2021-02-08T12:31:09Z
dc.date.available2021-02-08T12:31:09Z
dc.date.issued2021-02-08
dc.descriptionDr. George Turkiyyah; Dr. Amer E. Mouawad
dc.description.abstractDynamic Parallelism on GPUs provides the means for the GPU to generate work for itself instead of relying on the CPU where a thread running on the GPU can also launch grids of threads that also run on the GPU. This mechanism is particularly useful with applications where the required parallelism is dynamic and unknown on execution. However, multiple performance issues arise when using dynamic parallelism. First, the massive number of small launches incurs massive overhead. Second, the high number of launches is bottlenecked by the limited numbers of simultaneously executable kernels. Third, the small grids occupying the GPU causes the device to be underutilized. In this thesis, we aim to propose a framework that optimizes dynamic parallelism performance by applying three key compiler optimization techniques: threshold, coarsening, and aggregation. Thresholding serializes the kernel work when the dynamic parallelism benefit is potentially cancelled by the launch overhead. Coarsening allows a single child thread block to sequentially execute the work of multiple other child thread blocks. Aggregation consolidates multiple child grids into a single aggregated grid. We automate these optimizations as separate compiler passes then analyze and evaluate the interactions between them. We also combine them in a single compiler flow, our evaluation on data sets with high parallelism irregularity shows that when our compiler framework is applied on applications with nested parallelism, on average, it achieves 43.0x speedup over applications that uses dynamic parallelism, 8.7x speedup over applications that do not use dynamic parallelism, and 3.6x speedup over applications that use dynamic parallelism with aggregation only. Our evaluation also shows that even with all optimizations applied, on datasets that have low irregularity and low parallelism requirements, dynamic parallelism still performs significantly worse.
dc.identifier.urihttp://hdl.handle.net/10938/22235
dc.language.isoen
dc.subjectGPUs, Dynamic Parallelism, Compilers
dc.titleA COMPILER FRAMEWORK FOR OPTIMIZING DYNAMIC PARALLELISM ON GPUS
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
MhdGhaithOlabiThesis.pdf
Size:
2.51 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.65 KB
Format:
Item-specific license agreed upon to submission
Description: