We present a weakly supervised approach to automatic ontology population from text and compare it with two other unsupervised approaches. In our experiments we populate a part of our ontology of Named Entities. We considered two high level categories - geographical locations and person names and ten sub-classes for each category. For each sub-class we automatically learn a syntactic model from a list of training examples and a parsed corpus. A novel syntactic indexing method allowed us to use large quantities of syntactically annotated data. The syntactic model for each named entity sub-class is a set of weighted syntactic features, i.e. words which typically co-occur with the members of the class in the corpus. The method is weakly supervised, since no manually annotated corpus is used in the learning process. The syntactic models are used to classify the unknown Named Entities in the test set. The method achieved promising results, i.e. 65% accuracy, and outperforms significantly the other two approaches.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com