

In several scientific domains large data repositories are generated. To find interesting and useful information in those repositories, efficient data mining techniques must be used. Many scientific fields, such as astronomy, biology, medicine, chemistry and earth science, get advantages from data mining analysis. The exploitation of data mining techniques in science helps scientists in hypothesis formation and gives them a support on their scientific practices, taking advantage from the knowledge that can be extracted from large data sources. Data mining tasks are often distributed since they involve data and tools located over geographically distributed environments, like the Grid. Therefore, it is fundamental to exploit effective paradigms, such as services and workflows, to model data mining tasks that are both multi-staged and distributed. This chapter discusses data mining services and workflows for analyzing scientific data in high performance distributed environments such as Grids and Clouds. It also presents a workflow formalism and a service-oriented programming framework, named DIS3GNO, for designing and running distributed data mining tasks in the Knowledge Grid.