As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
This paper outlines the parallelisation and vectorisation methods we have used to port a LU decomposition library to the Xeon Phi co-processor. We ported a LU factorisation algorithm, which utilizes the Gaussian elimination method to perform the decomposition, using Intel LEO directives, OpenMP 4.0 directives, Intel's Cilk array notation, and vectorisation directives. We compare the performance achieved with these different methods, investigate the cost of data transfer on the overall time to solution, and analyse the impact of these optimization and parallelisation techniques on code running on the host processors as well. The results show that performance can be improved on the Xeon Phi by optimising the memory operations, and that Cilk array notation can benefit this benchmark on standard processors but do not have the same impact on the Xeon Phi co-processor. We have also demonstrated cases where the Xeon Phi will compute our implementations faster than we can run them on a node of a HPC system, and that our implementations are not as efficient as the LU factorisation implemented in the mkl library.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.