Original title:
Performance of parallel QR factorization methods on the NVIDIA Grace CPU Superchip
Authors:
Břichňáč, V. ; Šístek, Jakub Document type: Papers Conference/Event: Programs and Algorithms of Numerical Mathematics /22./, Hejnice (CZ), 20240623
Year:
2025
Language:
eng Abstract:
This article studies several algorithms for QR factorization based on hierarchical Householder reflectors organized into elimination trees, which are particularly suited for tall-and-skinny matrices and allow parallelization. We examine the effect of various parameters on the performance of the tree-based algorithms. The work is accompanied with a custom implementation that utilizes a task-based runtime system (OpenMP or StarPU). The same algorithm is implemented in the PLASMA library. The performance evaluation is done on the recent NVIDIA Grace CPU Superchip.
Keywords:
NVIDIA Grace CPU; QR factorization; task-based programming Project no.: GA23-06159S (CEP) Funding provider: GA ČR Host item entry: Programs and Algorithms of Numerical Mathematics 22 : Proceedings of Seminar, ISBN 978-80-85823-74-5 Note: Související webová stránka: http://dx.doi.org/10.21136/panm.2024.03
Institution: Institute of Mathematics AS ČR
(web)
Document availability information: Fulltext is available in the digital repository of the Academy of Sciences. Original record: https://hdl.handle.net/11104/0368021