ABSTRACT
The emergence of High Performance Computing (HPC) has enabled the researchers to perform large scientific computations efficiently and quickly. But as the heterogeneity of the processing units of the HPC systems increased, the utilization of all the resources became an issue. Fully harnessing the power of these systems requires efficient division of work across all the processing units. This solves the issue of under-utilization of resources and improves performance of the application. In this research work, we present a dynamic approach to workload partitioning that obtains the optimal workload partition and schedules them to processing units for parallel processing. Our workload partitioning technique is able to respond automatically to performance variation to provide good performance, it requires very negligible training and is implemented as a library. Performance results show that our dynamic approach is better than static and linear approach. By running the Dense Matrix-Matrix Multiplication kernel library by our proposed method on both CPU and Graphics Processing Unit (GPU) in parallel, we obtain average speedups from to
over CPU and
to
over GPU. We used our method on multi-GPUs for which we obtain average speedups of
over CPU and
over single GPU.
ACKNOWLEDGEMENTS
The authors would like to thank the CUDA Center of Excellence at IIT Bombay, India for providing access to their HPC system and the anonymous reviewers for their helpful comments.
DISCLOSURE STATEMENT
No potential conflict of interest was reported by the authors.
ORCID
Mohsin Khan http://orcid.org/0000-0002-4966-070X
Additional information
Notes on contributors
![](/cms/asset/db89e7c3-bc75-41d9-8f8b-45939f9855b6/tijr_a_1436476_ilg0001.jpg)
Mohsin Khan
Mohsin Khan received B.E. and M.Tech Degrees from Visvesvaraya Technological University, India in 2011 and 2013, respectively. He is currently pursuing Ph.D. from Visvesvaraya Technological University, with the research area as High Performance Computing. He has published six papers in international journals and conferences. His research interests include Parallel Computing, Heterogeneous Computing, Source to Source code Auto Parallelization.
![](/cms/asset/2bd77072-cde6-43ae-a0cc-50cda7b3372a/tijr_a_1436476_ilg0002.jpg)
Waseem Ahmed
Waseem Ahmed is a professor at the HKBK College of Engineering, Visvesvaraya Technological University, India. He received his B.E. degree (1995) from R.V College of Engineering, Bangalore University, India, MSc degree (1999) from University of Houston, United States of America, and Ph.D. degree (2008) from Curtin University of Technology, Australia. He has authored/co-authored over 28 papers in international journals, transactions and conferences. He has also authored a book in the field of Embedded Systems. He is a reviewer of seven international peer reviewed journals, transactions and conferences. He has been the active organizing member of various FDPs, workshops, and conferences. His research interests include Embedded Systems, High Performance Computing, and Data Mining. Email: [email protected]
![](/cms/asset/411205c2-d909-4ad8-81ef-340f8ba165e5/tijr_a_1436476_ilg0003.jpg)
Touseef M. Golandaz
Touseef M. Golandaz has completed B.E. (2014) from SECAB College of Engineering, India and M.Tech (2016) from HKBK College of Engineering, India. His area of interest includes ethical hacking, compiler design, GPGPU Computing, and Artificial Intelligence. His goal is to become a professor and an ethical hacker for which he is looking for options for Ph.D. research in the related fields. Email: [email protected]