Randomized GPU algorithms for the construction of hierarchical matrices from matrix-vector operations

dc.contributor.authorBoukaram, Wajih Halim
dc.contributor.authorTurkiyyah, George M.
dc.contributor.authorKeyes, David E.
dc.contributor.departmentDepartment of Computer Science
dc.contributor.facultyFaculty of Arts and Sciences (FAS)
dc.contributor.institutionAmerican University of Beirut
dc.date.accessioned2025-01-24T11:22:58Z
dc.date.available2025-01-24T11:22:58Z
dc.date.issued2019
dc.description.abstractRandomized algorithms for the generation of low rank approximations of large dense matrices have become popular methods in scientific computing and machine learning. In this paper, we extend the scope of these methods and present batched GPU randomized algorithms for the efficient generation of low rank representations of large sets of small dense matrices, as well as their generalization to the construction of hierarchically low rank symmetric H2 matrices with general partitioning structures. In both cases, the algorithms need to access the matrices only through matrix-vector multiplication operations which can be done in blocks to increase the arithmetic intensity and substantially boost the resulting performance. The batched GPU kernels are adaptive, allow nonuniform sizes in the matrices of the batch, and are more effective than SVD factorizations on matrices with fast decaying spectra. The hierarchical matrix generation consists of two phases, interleaved at every level of the matrix hierarchy. A first phase adaptively generates low rank approximations of matrix blocks through randomized matrix-vector sampling. A second phase accumulates and compresses these blocks into a hierarchical matrix that is incrementally constructed. The accumulation expresses the low rank blocks of a given level as a set of local low rank updates that are performed simultaneously on the whole matrix allowing high-performance batched kernels to be used in the compression operations. When the ranks of the blocks generated in the first phase are too large to be processed in a single operation, the low rank updates can be split into smaller-sized updates and applied in sequence. Assuming representative rank k, the resulting matrix has optimal O(kN) asymptotic storage complexity because of the nested bases it uses. The ability to generate an H2 matrix from matrix-vector products allows us to support a general randomized matrix-matrix multiplication operation, an important kernel in hierarchical matrix computations. Numerical experiments demonstrate the high performance of the algorithms and their effectiveness in generating hierarchical matrices to a desired target accuracy. © 2019 Society for Industrial and Applied Mathematics
dc.identifier.doihttps://doi.org/10.1137/18M1210101
dc.identifier.eid2-s2.0-85071935752
dc.identifier.urihttp://hdl.handle.net/10938/25581
dc.language.isoen
dc.publisherSociety for Industrial and Applied Mathematics Publications
dc.relation.ispartofSIAM Journal on Scientific Computing
dc.sourceScopus
dc.subjectBatched algorithms
dc.subjectGpu
dc.subjectHierarchical matrices
dc.subjectLow rank factorization
dc.subjectLow rank updates
dc.subjectMatrix compression
dc.subjectMatrix-matrix multiplication
dc.subjectNested bases
dc.subjectRandomized algorithms
dc.subjectApproximation algorithms
dc.subjectApproximation theory
dc.subjectGraphics processing unit
dc.subjectMachine learning
dc.subjectMatrix algebra
dc.subjectVectors
dc.subjectBatched algorithm
dc.subjectHierarchical matrix
dc.subjectLow rank update
dc.subjectMatrix
dc.subjectMatrix-matrix multiplications
dc.subjectNested basis
dc.subjectPerformance
dc.subjectFactorization
dc.titleRandomized GPU algorithms for the construction of hierarchical matrices from matrix-vector operations
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2019-7754.pdf
Size:
538.98 KB
Format:
Adobe Portable Document Format