112
Views
14
CrossRef citations to date
0
Altmetric
Original Articles

Accurately measuring overhead, communication time and progression of blocking and nonblocking collective operations at massive scale

, &
Pages 241-258 | Received 01 Dec 2008, Accepted 11 Feb 2009, Published online: 09 Jul 2010

References

  • Agarwal , S. , Garg , R. and Vishnoi , N. 2005 . “ The impact of noise on the scaling of collectives: A theoretical approach ” . In 12th Annual IEEE International Conference on High Performance Computing, Goa, India
  • Alam , S.R. , Bhatia , N. and Vetter , J.S. 2007 . “ An exploration of performance attributes for symbolic modeling of emerging processing devices ” . In Lecture Notes in Computer Science , Edited by: Perrott , R.H. , Chapman , B.M. , Subhlok , J. , de Mello , R.F. and Yang , L.T. Vol. 4782 , 683 – 694 . New York : Springer .
  • Bönisch , T. , Resch , M.M. and Berger , H. 1997 . “ Benchmark evaluation of the message-passing overhead on modern parallel architectures ” . In Parallel Computing: Fundamentals, Applications and New Directions, Proceedings of the Conference ParCo97 411 – 418 .
  • Culler , D. , Karp , R. , Patterson , D. , Sahay , A. , Schauser , K.E. , Santos , E. , Subramonian , R. and von Eicken , T. 1993 . “ LogP: Towards a realistic model of parallel computation ” . In Principles Practice of Parallel Programming 1 – 12 .
  • Culler , D. , Liu , L.T. , Martin , R.P. and Yoshikawa , C. February 1996 . LogP performance assessment of fast network interfaces . IEEE Micro , Vol. 16 : 35 – 43 .
  • Gropp , W. and Lusk , E.L. 1999 . “ Reproducible measurements of mpi performance characteristics ” . In Proceedings of the 6th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface , 11 – 18 . London, UK : Springer-Verlag .
  • Hoefler , T. and Lumsdaine , A. 2008 . “ Message progression in parallel computing – To thread or not to thread? ” . In Proceedings of the 2008 IEEE International Conference on Cluster Computing , Tsukuba, Japan : IEEE Computer Society .
  • Hoefler , T. and Lumsdaine , A. 2008 . “ Optimizing non-blocking collective operations for InfiniBand ” . In Proceedings of the 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS). 04
  • Hoefler , T. , Janisch , R. and Rehm , W. 2006 . “ Parallel scaling of Teter's minimization for ab initio calculations ” . In Presented at the Workshop HPC Nano in Conjunction With SC'06
  • Hoefler , T. , Mehlan , T. , Mietke , F. and Rehm , W. 2006 . “ Fast barrier synchronization for InfiniBand ” . In Proceedings, 20th International Parallel and Distributed Processing Symposium IPDPS (CAC 06)
  • Hoefler , T. , Mehlan , T. , Mietke , F. and Rehm , W. April 2006 . “ LogfP – A model for small messages in InfiniBand ” . In Proceedings, 20th International Parallel and Distributed Processing Symposium IPDPS (PMEO-PDS 06)
  • Hoefler , T. , Lichei , A. and Rehm , W. 2007 . Low-overhead LogGP parameter assessment for modern interconnection networks .
  • Hoefler , T. , Lumsdaine , A. and Rehm , W. 2007 . “ Implementation and performance analysis of non-blocking collective operations for mpi ” . In Proceedings of Supercomputing'07
  • Hoefler , T. , Mehlan , T. , Lumsdaine , A. and Rehm , W. 2007 . “ Netgauge: A network performance measurement framework ” . In Proceedings of the High Performance Computing and Communications, 3rd International Conference, HPCC, Houston, USA, September 26–28 , Vol. 4782 , 659 – 671 . New York : Springer .
  • Intel Corporation, Intel Application Notes – Using the RDTSC Instruction for Performance Monitoring, Technical report, Intel. 1997
  • Iskra , K. , Beckman , P. , Yoshii , K. and Coghlan , S. 2006 . “ The influence of operating systems on the performance of collective operations at extreme scale ” . In Proceedings of Cluster Computing, 2006 IEEE International Conference
  • Kohno , T. , Broido , A. and Claffy , K. 2005 . Remote physical device fingerprinting . IEEE Trans. Depend. Secure Comput. , 2 ( 2 ) : 93 – 108 .
  • P.J. Mucci, K. London, and J. Thurman, The MPIBench report, Technical report, CEWES/ERDC MSRC/PET, 1998
  • Murdoch , S. 2006 . “ Hot or not: Revealing hidden services by their clock skew ” . In Proceedings of the 13th ACM conference on Computer and Communications Security 27 – 36 .
  • Pallas GmbH, Pallas MPI Benchmarks – PMB, Part MPI-1, Technical report, 2000
  • J. Pjesivac-Grbovic, Open MPI collective operation performance on thunderbird, Technical report, The University of Tennessee, Computer Science Department, Knoxville, Technical Report, UT-CS-07-594, 2007
  • Pjesivac-Grbovic , J. , Angskun , T. , Bosilca , G. , Fagg , G.E. , Gabriel , E. and Dongarra , J.J. April 2005 . “ Performance analysis of MPI collective operations ” . In Proceedings of the 19th International Parallel and Distributed Processing Symposium, Denver, CO
  • Rabenseifner , R. 2000 . “ Automatic MPI counter profiling ” . In 42nd CUG Conference
  • Saini , S. , Ciotti , R. , Gunney , B.T.N. , Spelce , T.E. , Koniges , A.E. , Dossa , D. , Adamidis , P.A. , Rabenseifner , R. , Tiyyagura , S.R. , Müller , M. and Fatoohi , R. 2006 . “ Performance evaluation of supercomputers using hpcc and imb benchmarks ” . In Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), Rhodes Island, 25–29 April
  • Shro , M. and Geijn , R. 1999 . CollMark MPI Collective Communication Benchmark . Available at: citeseer.ist.psu.edu/shroff00collmark.html, hoefler-netgauge-hpcc07
  • Vadhiyar , S.S. , Fagg , G.E. and Dongarra , J. 2000 . “ Automatically tuned collective communications ” . In Supercomputing '00: Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM) , 3 Washington, DC : IEEE Computer Society .
  • Worsch , T. , Reussner , R. and Augustin , W. 2002 . “ On benchmarking collective mpi operations ” . In Proceedings of the 9th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface , 271 – 279 . London : Springer-Verlag .
  • Yu , W. , Buntinas , D. , Graham , R.L. and Panda , D.K. 2004 . “ Efficient and scalable barrier over quadrics and myrinet with a new nic-based collective message passing protocol ” . In 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), CD-ROM/Abstracts Proceedings, 26–30 April 2004, Santa Fe, New Mexico, USA

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.