News dataset is one of the most abundant data source for recording any event happening around people. For news event detection, people usually need to collect the related news to explore major events manually. To explore major events in large news datasets is difficult due to the amount of data grows quickly with the rapid development of the Web and also an article of news with unstructured data. How to discover events from unstructured-like articles has become an important problem. In this paper, we propose an event detection algorithm based on five-dimensional named entity feature and random-walk with restart to achieve event detection in news articles with unstructured data. The first part of this algorithm is to categorize news term into five predefined named-entity by exploring the Web page of Wikipedia in order to generate more distinctive features of each news article. The second one is to aggregate the news articles by the similarity between news articles using random-walk with restart clustering algorithm. The experimental results show that the proposed algorithm is indeed effective. Especially it is also demonstrated that this algorithm provides better event detection quality than other approaches in terms of the ability of handling multi-event news articles.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com