As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
The aim of document clustering is to produce coherent clusters of similar documents. Although most document clustering algorithms perform well in specific knowledge domains, processing cross-domain document repositories is still a challenge. This difficulty can be attributed to word ambiguity and explained by the observation that monosemic words are more domainoriented than polysemic ones. Document clustering algorithms normally employ text normalization techniques, such as the Porter stemming algorithm. This paper describes a semantically enhanced text normalization algorithm developed for the purpose of improved document clustering. Corpus consistency achieved by the proposed algorithm is compared with the consistency produced by the Porter stemmer. The experimental evidence shows that semantic disambiguation improves clustering performance compared to traditional normalization methods.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.