ABSTRACT
A novel methodology for the evaluation of two electron integrals up to f functions using Graphics Processing Units (GPUs) is presented. The Head-Gordon-Pople recursion relationships are solved via a simple heuristic methodology to minimize the number of evaluated intermediates in the recursion trees. Automatic code generation is used to generate highly optimized CUDA kernels. A novel approach for f functions is presented in which integral classes are split into smaller subclasses to minimize register pressure and exploit additional parallelism at the cost of recomputing a small number of intermediates. Alongside optimized kernels, the ERI evaluation scheme works in conjunction with an efficient work distribution scheme which guarantees load-balancing during computation. The new HGP scheme shows excellent speedups of 2 to above 60
against existing GPU code. Additionally, when coupled with digestion into the Fock matrix, the scaling is excellent on up to 7 GPUs with an 85% parallel efficiency for the 6-31G(d) basis set.
GRAPHICAL ABSTRACT
![](/cms/asset/ee23bcc6-5296-4711-a3e3-231aa3fadfd0/tmph_a_2112987_uf0001_oc.jpg)
SUBJECT CLASSIFICATION CODES:
Acknowledgements
This work was supported by a Department of Energy exascale computing project to the Ames Laboratory, Project Number: 17-SC-20-SC. Ames Laboratory is operated by Iowa State University under Contract No. DE-AC02-07CH11338. The authors acknowledge and appreciate the many important contributions made by Professor Peter Gill to the field of electronic structure theory.
Disclosure statement
No potential conflict of interest was reported by the author(s).