OHDOCLUS – Online and Hierarchical Document Clustering

Encarna&#231;&#227;o, Rui; Oliveira, Hugo Gon&#231;alo

doi:10.3233/978-1-61499-682-8-51

Abstract

Usually, clustering algorithms consider that document collections are static and are processed as a whole. However, in contexts where data is constantly being produced (e.g. the Web), systems that receive and process documents incrementally are becoming more and more important. We propose OHDOCLUS, an online and hierarchical algorithm for document clustering. OHDOCLUS creates a tree of clusters where documents are classified as soon as they are received. It is based on COBWEB and CLASSIT, two well-known data clustering algorithms that create hierarchies of probabilistic concepts and were seldom applied to text data. An experimental evaluation was conducted with categorized corpora, and the preliminary results confirm the validity of the proposed method.

This website uses cookies

This website uses cookies