AUB ScholarWorks

A COMPILER FRAMEWORK FOR OPTIMIZING DYNAMIC PARALLELISM ON GPUS

Show simple item record

dc.contributor.advisor El Hajj, Izzat
dc.contributor.author Olabi, Mhd Ghaith
dc.date.accessioned 2021-02-08T12:31:09Z
dc.date.available 2021-02-08T12:31:09Z
dc.date.issued 2021-02-08
dc.identifier.uri http://hdl.handle.net/10938/22235
dc.description Dr. George Turkiyyah; Dr. Amer E. Mouawad
dc.description.abstract Dynamic Parallelism on GPUs provides the means for the GPU to generate work for itself instead of relying on the CPU where a thread running on the GPU can also launch grids of threads that also run on the GPU. This mechanism is particularly useful with applications where the required parallelism is dynamic and unknown on execution. However, multiple performance issues arise when using dynamic parallelism. First, the massive number of small launches incurs massive overhead. Second, the high number of launches is bottlenecked by the limited numbers of simultaneously executable kernels. Third, the small grids occupying the GPU causes the device to be underutilized. In this thesis, we aim to propose a framework that optimizes dynamic parallelism performance by applying three key compiler optimization techniques: threshold, coarsening, and aggregation. Thresholding serializes the kernel work when the dynamic parallelism benefit is potentially cancelled by the launch overhead. Coarsening allows a single child thread block to sequentially execute the work of multiple other child thread blocks. Aggregation consolidates multiple child grids into a single aggregated grid. We automate these optimizations as separate compiler passes then analyze and evaluate the interactions between them. We also combine them in a single compiler flow, our evaluation on data sets with high parallelism irregularity shows that when our compiler framework is applied on applications with nested parallelism, on average, it achieves 43.0x speedup over applications that uses dynamic parallelism, 8.7x speedup over applications that do not use dynamic parallelism, and 3.6x speedup over applications that use dynamic parallelism with aggregation only. Our evaluation also shows that even with all optimizations applied, on datasets that have low irregularity and low parallelism requirements, dynamic parallelism still performs significantly worse.
dc.language.iso en_US
dc.subject GPUs, Dynamic Parallelism, Compilers
dc.title A COMPILER FRAMEWORK FOR OPTIMIZING DYNAMIC PARALLELISM ON GPUS
dc.type Thesis
dc.contributor.department Computer Science
dc.contributor.faculty Faculty of Arts and Sciences


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search AUB ScholarWorks


Browse

My Account