Performance analysis of the ABySS genome sequence assembler (ABYSS-P) executing on the K computer with up to 8192 compute nodes is described which identified issues that limited scalability to less than 1024 compute nodes and required prohibitive message buffer memory with 16384 or more compute nodes. The open-source Scalasca toolset was employed to analyse executions, revealing the impact of massive amounts of MPI point-to-point communication used particularly for master/worker process coordination, and inefficient parallel file operations that manifest as waiting time at later MPI collective synchronisations and communications. Initial remediation via use of collective communication operations and alternate strategies for parallel file handling show large performance and scalability improvements, with partial executions validated on the full 82,944 compute nodes of the K computer.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com