A COMPILER FRAMEWORK FOR OPTIMIZING DYNAMIC PARALLELISM ON GPUS

Olabi, Mhd Ghaith

AUB ScholarWorks Home
→
Students Publications
→
AUB Students' Theses, Dissertations, and Projects
→
View Item

dc.contributor.advisor	El Hajj, Izzat
dc.contributor.author	Olabi, Mhd Ghaith
dc.date.accessioned	2021-02-08T12:31:09Z
dc.date.available	2021-02-08T12:31:09Z
dc.date.issued	2021-02-08
dc.identifier.uri	http://hdl.handle.net/10938/22235
dc.description	Dr. George Turkiyyah; Dr. Amer E. Mouawad
dc.description.abstract	Dynamic Parallelism on GPUs provides the means for the GPU to generate work for itself instead of relying on the CPU where a thread running on the GPU can also launch grids of threads that also run on the GPU. This mechanism is particularly useful with applications where the required parallelism is dynamic and unknown on execution. However, multiple performance issues arise when using dynamic parallelism. First, the massive number of small launches incurs massive overhead. Second, the high number of launches is bottlenecked by the limited numbers of simultaneously executable kernels. Third, the small grids occupying the GPU causes the device to be underutilized. In this thesis, we aim to propose a framework that optimizes dynamic parallelism performance by applying three key compiler optimization techniques: threshold, coarsening, and aggregation. Thresholding serializes the kernel work when the dynamic parallelism benefit is potentially cancelled by the launch overhead. Coarsening allows a single child thread block to sequentially execute the work of multiple other child thread blocks. Aggregation consolidates multiple child grids into a single aggregated grid. We automate these optimizations as separate compiler passes then analyze and evaluate the interactions between them. We also combine them in a single compiler flow, our evaluation on data sets with high parallelism irregularity shows that when our compiler framework is applied on applications with nested parallelism, on average, it achieves 43.0x speedup over applications that uses dynamic parallelism, 8.7x speedup over applications that do not use dynamic parallelism, and 3.6x speedup over applications that use dynamic parallelism with aggregation only. Our evaluation also shows that even with all optimizations applied, on datasets that have low irregularity and low parallelism requirements, dynamic parallelism still performs significantly worse.
dc.language.iso	en_US
dc.subject	GPUs, Dynamic Parallelism, Compilers
dc.title	A COMPILER FRAMEWORK FOR OPTIMIZING DYNAMIC PARALLELISM ON GPUS
dc.type	Thesis
dc.contributor.department	Computer Science
dc.contributor.faculty	Faculty of Arts and Sciences

Files in this item

Name: MhdGhaithOlabiThe ...

Size: 2.511Mb

Format: PDF

View/Open

This item appears in the following Collection(s)

AUB Students' Theses, Dissertations, and Projects [12709]

Show simple item record

Search AUB ScholarWorks

Browse

All of AUB ScholarWorks
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

My Account

Copyright Statement

All materials included in the institutional repository are protected by copyright laws and are the property of their respective copyright holders. Materials may be used for non-commercial, educational, or research purposes only, and must be cited or attributed to the original source. Permission for any other use must be obtained from the copyright holder(s) directly. The American University of Beirut Libraries does not assume responsibility for any infringement of copyright laws that may occur as a result of the use of materials in the repository. If you believe that your copyright has been infringed upon in the repository, please contact the AUB Libraries immediately.

For further information, please contact us at scholarworks@aub.edu.lb