Abstract
The implicit discrete ordinates (SN) discontinuous Galerkin (DG)/generalized minimal residual (GMRES) method has been implemented on shared-memory, multicore CPU/many-core GPU heterogeneous computation architecture via standard application programming interfaces (APIs) of open multiprocessing (OpenMP) and computer unified device architecture (CUDA). The compiler derivative based OpenMP parallelization has been performed to port the sequential SNDG/GMRES implementation to run in parallel on multiple CPU cores that share global memory. Benchmarked on a workstation comprising 12 Intel(R) Xeon(R) CPU cores, a parallel efficiency of ∼90% has been achieved for all reference problems. The ILUT preconditioning has been performed to improve algebraic conditions of associated linear systems, and it has been demonstrated that preconditioning has accelerated simulations of all reference problems by a factor of 10. Finally, it has been showcased that a K20m GPU has boosted the computation capacity of an Intel(R) Xeon(R) CPU core by a modest factor of 2–4 for all the reference problems.
Conflict of interest statement
The author declares that there are no financial and personal relationships with other people or organizations that can inappropriately influence this work. There is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.