Computational Technology Resources - CCP

Keywords: empirical optimization, abstract data and communication library, collective communication, NAS parallel benchmarks.

Summary

This paper investigates how to assess the performance of different implementation alternatives for an empirical optimization library for collective MPI communications, namely the Abstract Data and Communication Library (ADCL). The first and simplest timing technique is to embrace the collective communication with timing routines and use the maximum local execution time as an estimator for the total execution time of the collective communication. A second solution adds synchronizations before the calls to the timing routines. The synchronizations create additional overhead during the search phase and might introduce systematic errors since application features such as process arrival patterns are annulated before the measurement is started.

The third options is based on the new timer object in ADCL. The timer object allows the measurement of the codelet plus its environment, e.g. one or more iterations in the code. Its purpose is to capture the original execution behavior of the program and avoid distortions caused by synchronizations directly before and after the execution of the codelet. However, the user now has the responsibility to select the right code portion to measure.

To investigate the different timing techniques, we use the MPI FFT Benchmark from the NAS Parallel Benchmarks 3.0 which involves an all-to-all communication operation. Tests have been executed on a wide variety of parallel platforms. For each system, the FFT benchmark has been executed for various classes, different numbers of processes, and multiple MPI implementations. We demonstrated that the accuracy of performance prediction for performance data generated with this timer object is superior to the other techniques. An enhanced version of the timer object nearly exactly reproduced quantitatively the performance data obtained from longer runs for stable test environments.

purchase the full-text of this paper (price £20)

go to the previous paper
go to the next paper
return to the table of contents
return to the book description
purchase this book (price £85 +P&P)

	Computational & Technology Resources an online resource for computational, engineering & technology publications
	not logged in - login
Front Page Browse CCP CSETS CTR IJRT Other Authors Search Purchase Guide FAQ Contact us	Civil-Comp Proceedings ISSN 1759-3433 CCP: 95 PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, GRID AND CLOUD COMPUTING FOR ENGINEERING Edited by: Paper 54 Timing Collective Communications in an Empirical Optimization Framework K. Benkert¹, E. Gabriel² and S. Roller³ ¹High Performance Computing Center Stuttgart (HLRS), Stuttgart, Germany ²Parallel Software Technologies Laboratory, Department of Computer Science, University of Houston, United States of America ³German Research School for Simulation Sciences, Aachen, Germany doi:10.4203/ccp.95.54 purchase the full-text of this paper Full Bibliographic Reference for this paper K. Benkert, E. Gabriel, S. Roller, "Timing Collective Communications in an Empirical Optimization Framework", in , (Editors), "Proceedings of the Second International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering", Civil-Comp Press, Stirlingshire, UK, Paper 54, 2011. doi:10.4203/ccp.95.54 Keywords: empirical optimization, abstract data and communication library, collective communication, NAS parallel benchmarks. Summary This paper investigates how to assess the performance of different implementation alternatives for an empirical optimization library for collective MPI communications, namely the Abstract Data and Communication Library (ADCL). The first and simplest timing technique is to embrace the collective communication with timing routines and use the maximum local execution time as an estimator for the total execution time of the collective communication. A second solution adds synchronizations before the calls to the timing routines. The synchronizations create additional overhead during the search phase and might introduce systematic errors since application features such as process arrival patterns are annulated before the measurement is started. The third options is based on the new timer object in ADCL. The timer object allows the measurement of the codelet plus its environment, e.g. one or more iterations in the code. Its purpose is to capture the original execution behavior of the program and avoid distortions caused by synchronizations directly before and after the execution of the codelet. However, the user now has the responsibility to select the right code portion to measure. To investigate the different timing techniques, we use the MPI FFT Benchmark from the NAS Parallel Benchmarks 3.0 which involves an all-to-all communication operation. Tests have been executed on a wide variety of parallel platforms. For each system, the FFT benchmark has been executed for various classes, different numbers of processes, and multiple MPI implementations. We demonstrated that the accuracy of performance prediction for performance data generated with this timer object is superior to the other techniques. An enhanced version of the timer object nearly exactly reproduced quantitatively the performance data obtained from longer runs for stable test environments. purchase the full-text of this paper (price £20) go to the previous paper go to the next paper return to the table of contents return to the book description purchase this book (price £85 +P&P)
Back to top	©Civil-Comp Limited 2023 - terms & conditions