As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
The Internet is providing a huge amount of information/knowledge through Web pages. For personal and effective use of such resources, the partial information extraction technology breaks a new path to enable end-users to obtain and integrate only needed information from various Web pages into original compositions. However the traditional XPath-only extraction method would fail in case Web sites use different templates to construct Web pages or change the layout of Web pages, which we call as the stability problem. In this paper, we propose a novel hybrid extraction mechanism for stably extract the partial information. We compare the original and changed Web pages to get the unchanged nodes as a stable-part list and use them to generate new paths. Since the list will be re-ranked after new stable-parts are found, the success rates of extraction can be self evolving and correspondingly reduce manual intervention. We show the usefulness of our approach by experiment on real Web sites in practice.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.