Web mining employs the techniques of data mining to extract information from the Web for a variety of purposes. The usual sources of data are the log files of WWW or proxy servers. The paper examines the possibility of using the local browser buffer for that purpose. The data that could be extracted from both types of logs are compared. It turns out, that despite its limitations the browser buffer is a rich source of unique data about user navigational habits and the properties of the fragment of the WWW that he/she visits. The cache contains the both the full body of a WWW object as well as the header control data sent by the server. Additionally the cache includes some basic information about the usage pattern of each object. Therefore it is possible to study the susceptibility to buffering the objects which is measured by the CF (cacheability factor) and to study the word diversity of Internet texts seen by the user. The CF factor provides an objective measure of the web site caching potential and thus makes it possible to infer about latency of the web site. The word diversity study tests the compliance of the Internet texts with the well known Zipf and Heaps Laws' that are valid for all natural languages. That part study could be used for the optimization indexing engines or the recommendation of pages potentially interesting for the user.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 email@example.com
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 firstname.lastname@example.org