As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
In this work we implement and evaluate a methodology to classify multi-labeled web documents into large-scale taxonomies, using their text content. Multi-label hierarchical classification using large-scale taxonomies is a hard task due to problems of scarcity of training data in many nodes of the hierarchy, overlapping of content and complex decision surfaces. We propose a novel feature extraction model called Multilayered Class Discrimination (MCD), which reduces the dimensions of the text-content features of the web documents along the different levels of the hierarchy, helping to discriminate each class from other classes in the same level and reducing the effects of the mentioned problems. The results of categorizing web documents from the DMOZ directory show that our model improves the accuracy of the categorization when compared with the use of word features, and that the results are competitive with the ones presented in the Second LSHTC Challenge.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.