High Performance Big Data Clustering

Agrawal, Ankit; Patwary, Md. Mostofa Ali; Hendrix, William; Liao, Wei-keng; Choudhary, Alok

doi:10.3233/978-1-61499-322-3-192

Abstract

Scientific advances are collectively exploding the amount, diversity, and complexity of data becoming available. Our ability to collect huge amounts of data has greatly surpassed our analytical capacity to make sense of it. Efficient use of high performance computing techniques is critical for the success of the data-driven paradigm to scientific discovery. Data clustering is one of the fundamental analytics tasks heavily relied upon in many application domains, like astrohpysics, climate science, bioinformatics, etc. In this book chapter, we illustrate the challenges and opportunities in mining big data using two recently developed scalable parallel clustering algorithms. Experimental results on millions of high-dimensional data points clustered in parallel on thousands of processor cores are also presented.

This website uses cookies

This website uses cookies