112
Views
14
CrossRef citations to date
0
Altmetric
Original Articles

Accurately measuring overhead, communication time and progression of blocking and nonblocking collective operations at massive scale

, &
Pages 241-258 | Received 01 Dec 2008, Accepted 11 Feb 2009, Published online: 09 Jul 2010
 

Abstract

Accurate, reproducible and comparable measurement of the overheads, communication times and progression behaviour of blocking and nonblocking collective operations is a complicated task. Although different measurement schemes for blocking collective operations are implemented in well-known benchmarks, many of these schemes introduce different systematic errors in their measurements. We characterise these errors and select a window-based approach as the most accurate method. However, this approach complicates measurements significantly and introduces clock synchronisation as a new source of errors. We analyse approaches to avoid or correct those errors and develop a scalable synchronisation scheme to conduct benchmarks on massively parallel systems. Our results are compared to the window-based scheme implemented in the SKaMPI benchmarks and show a reduction of the synchronisation overhead by a factor of 16 on 128 processes. We also describe two different measurement schemes for the overhead and asynchronous progress of nonblocking collective communications. An implementation and results of both measurement schemes are presented.

Acknowledgements

This research was funded by a gift from the Silicon Valley Community Foundation, on behalf of the Cisco Collaborative Research Initiative of Cisco Systems.

Notes

3 We used the 50,000 RTTs gathered as described in Section 2.

Additional information

Notes on contributors

Timo Schneider

1 1 [email protected].

Andrew Lumsdaine

2 2 [email protected].

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.