Twitter has become one of the most popular Location-Based Social Networks (LBSNs) that enables bridging physical and virtual worlds. Tweets, 140-character-long messages published in Twitter, are aimed to provide basic responses to the What's happening? question. Occurrences and events in the real life are usually reported through geo-located tweets by users on site. Uncovering event-related tweets from the rest is a challenging problem that necessarily requires exploiting different tweet features. With that in mind, we propose Tweet-SCAN, a novel event discovery technique based on the density-based clustering algorithm called DB-SCAN. Tweet-SCAN takes into account four main features from a tweet, namely content, time, location and user to cluster homogeneously event-related tweets. This new technique models textual content through a probabilistic topic model called Hierarchical Dirichlet Process and introduces Jensen-Shannon distance for the task of neighborhood identification in the textual dimension. As a matter of fact, we show Tweet-SCAN performance in a real data set of geo-located tweets posted during Barcelona local festivities in 2014, for which some of the events were known beforehand. By means of this data set, we are able to assess Tweet-SCAN capabilities to discover events, justify using a textual component and highlight the effects of several parameters.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com