Outlier is attached importance in statistics and engineering, because it might result in misleading identification results. However, there is significant uncertainty in the outlier detection, when an outlying observation lies close to the boundary between outliers and regular data or there are sparse observations. The associated uncertainty of outlier mostly results from statistical uncertainty of parameters, such as mean value and standard deviation. However, it is unknown how the statistical uncertainty influences the outlier detection. This paper compares two outlier detection methods for influence study of statistical uncertainty on probabilistic outlier detection. One is based on Mahalanobis distance (MD) using the total probability theorem combining with the half-means method (RHM). The other is RHM method with Bayesian machine learning (BML), which can consider the statistical uncertainties of parameters in MD. The simulated dataset with outliers are used to comparative study. Different dimensional dataset and various numbers of observations and outliers are simulated. Thereinto, outliers are simulated through double-mode triangle distribution. The results show that it is necessary to consider the statistical uncertainty for sparse multivariate observations.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 email@example.com
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 firstname.lastname@example.org