The Web is the richest source of information and knowledge. Unfortunately the current structure of Web pages makes it difficult for users to retrieve the information or knowledge in a systematic way. In this paper, using the tree approach, we propose a personal Web information/knowledge retrieval system for the extraction of structured parts from Web pages. First we get the layout pattern and paths of extraction parts of a typical Web page in target sites. Then we use the recorded layout pattern and paths to extract the structured parts from the rest of Web pages in target sites. We show the usefulness of our approach using the results of extracting structured parts of notable Web pages.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com