In several scientific domains large data repositories are generated. To find interesting and useful information in those repositories, efficient data mining techniques must be used. Many scientific fields, such as astronomy, biology, medicine, chemistry and earth science, get advantages from data mining analysis. The exploitation of data mining techniques in science helps scientists in hypothesis formation and gives them a support on their scientific practices, taking advantage from the knowledge that can be extracted from large data sources. Data mining tasks are often distributed since they involve data and tools located over geographically distributed environments, like the Grid. Therefore, it is fundamental to exploit effective paradigms, such as services and workflows, to model data mining tasks that are both multi-staged and distributed. This chapter discusses data mining services and workflows for analyzing scientific data in high performance distributed environments such as Grids and Clouds. It also presents a workflow formalism and a service-oriented programming framework, named DIS3GNO, for designing and running distributed data mining tasks in the Knowledge Grid.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 email@example.com
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 firstname.lastname@example.org