Clustering algorithms have been used to divide genes into groups according to the degree of their expression similarity. Such a grouping may suggest that the respective genes are correlated and/or co-regulated, and subsequently indicates that the genes could possibly share a common biological role. In this paper, four clustering algorithms are investigated: k-means, cut-clustering, spectral and expectation-maximization. The algorithms are benchmarked against each other. The performance of the four clustering algorithms is studied on time series expression data using Dynamic TimeWarping distance in order to measure similarity between gene expression profiles. Four different cluster validation measures are used to evaluate the clustering algorithms: Connectivity and Silhouette Index for estimating the quality of clusters, Jaccard Index for evaluating the stability of a cluster method and Rand Index for assessing the accuracy. The obtained results are analyzed by Friedman's test and the Nemenyi post-hoc test. K-means is demonstrated to be significantly better than the spectral clustering algorithm under the Silhouette and Rand validation indices.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com