Graph-based approaches have been shown to be efficient in information extraction, especially in the case of text mining. Compared to methods like vector space models, a graph representation of a document has less information loss caused by feature extraction. However, constructing graph models are more CPU and memory intensive, thus utilizing HPC solutions seems inevitable in this case. This paper suggests a pipeline method of constructing a graph model that lets for an arbitrary level of parallel processing and distributed computing. This method also enables a wide range of data visualization opportunity. It is shown that big data hardware and software infrastructures could be used without any algorithmic limit. Results show a significant decrease in runtime.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com