Abstract:
Deep Neural Networks (DNNs) require a huge amount of computational power and memory storage. Hence, sparsifying the neural network was proposed as a technique to help reduce the computational complexities of DNNs. However, when dealing with parallelization, we face multiple challenges like load balancing, memory management, and many others. Many studies have tackled these problems, some using CPUs, and more recent studies using GPUs. Since modern GPUs, compared to the CPUs, promise a much higher peak floating-point performance and memory bandwidth, we based our study on running DNNs on GPUs. Many works have proven the efficiency of GPUs in dealing with sparse matrices. Our aim is to further explore the effects of applying a combination of various storage formats on the GPU while testing different tiling strategies. We would also be proposing a technique for better memory utilization.