A COMPILER FRAMEWORK FOR OPTIMIZING DYNAMIC PARALLELISM ON GPUS

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Dynamic Parallelism on GPUs provides the means for the GPU to generate work for itself instead of relying on the CPU where a thread running on the GPU can also launch grids of threads that also run on the GPU. This mechanism is particularly useful with applications where the required parallelism is dynamic and unknown on execution. However, multiple performance issues arise when using dynamic parallelism. First, the massive number of small launches incurs massive overhead. Second, the high number of launches is bottlenecked by the limited numbers of simultaneously executable kernels. Third, the small grids occupying the GPU causes the device to be underutilized. In this thesis, we aim to propose a framework that optimizes dynamic parallelism performance by applying three key compiler optimization techniques: threshold, coarsening, and aggregation. Thresholding serializes the kernel work when the dynamic parallelism benefit is potentially cancelled by the launch overhead. Coarsening allows a single child thread block to sequentially execute the work of multiple other child thread blocks. Aggregation consolidates multiple child grids into a single aggregated grid. We automate these optimizations as separate compiler passes then analyze and evaluate the interactions between them. We also combine them in a single compiler flow, our evaluation on data sets with high parallelism irregularity shows that when our compiler framework is applied on applications with nested parallelism, on average, it achieves 43.0x speedup over applications that uses dynamic parallelism, 8.7x speedup over applications that do not use dynamic parallelism, and 3.6x speedup over applications that use dynamic parallelism with aggregation only. Our evaluation also shows that even with all optimizations applied, on datasets that have low irregularity and low parallelism requirements, dynamic parallelism still performs significantly worse.

Description

Dr. George Turkiyyah; Dr. Amer E. Mouawad

Keywords

GPUs, Dynamic Parallelism, Compilers

Citation

Endorsement

Review

Supplemented By

Referenced By