XML (eXtensible Markup Language) became in recent years the new standard for data representation and exchange on the WWW. This has resulted in a great need for data cleaning techniques in order to identify outlying data. In this paper, we present a technique for outlier detection that singles out anomalies with respect to a relevant group of objects. We exploit a suitable encoding of XML documents that are encoded as signals of fixed frequency that can be transformed using Fourier Transforms. Outliers are identified by simply looking at the signal spectra. The results show the effectiveness of our approach.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com