In this paper we present a Parallel Spatial Data Warehouse (PSDW) system that we use for aggregation and analysis of huge amounts of spatial data. The data is generated by utilities meters communicating via radio. The PSDW system is based on a data model called the cascaded star model. In order to provide satisfactory interactivity for PSDW system, we used parallel computing supported by a special indexing structure called an aggregation tree. The balancing of a PSDW system workload is very essential to ensure the minimal response time of tasks submitted to process. We have implemented two data partitioning schemes which use Hilbert and Peano curves for space ordering. The presented balancing algorithm iteratively calculates optimal size of partitions, which are loaded into each node, by executing a series of aggregations on a test data set. We provide a collection of system tests results and its analysis that confirm the possibility of a balancing algorithm realization in proposed way. During ETL process (Extraction, Transformation and Loading) large amounts of data are transformed and loaded to PSDW. ETL processes are sometimes interrupted by occurrence of a failure. In such a case, one of the interrupted extraction resumption algorithms is usually used. In this paper we analyze the influence of the data balancing used in PSDW on the extraction and resumption processes efficiency.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com