Data analysis applications often include large datasets and complex software systems in which multiple data processing tools are executed in a coordinated way. Data analysis workflows are effective in expressing task coordination and they can be designed through visual- and script-based programming paradigms. The Data Mining Cloud Framework (DMCF) supports the design and scalable execution of data analysis applications on Cloud platforms. A workflow in DMCF can be developed using a visual- or a script-based language. The visual language, called VL4Cloud, is based on a design approach for high-level users, e.g., domain expert analysts having a limited knowledge of programming paradigms. The script-based language JS4Cloud is provided as a flexible programming paradigm for skilled users who prefer to code their workflows through scripts. Both languages implement a data-driven task parallelism that spawns ready-to-run tasks to Cloud resources. In addition, they exploit implicit parallelism that frees users from duties like workload partitioning, synchronization and communication. In this chapter, we present the DMCF framework and discuss how its workflow paradigm has been integrated with the MapReduce model. In particular, we describe how VL4Cloud/JS4Cloud workflows can include MapReduce tools, and how these workflows are executed in parallel on DMCF enabling scalable data processing on Clouds.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 email@example.com
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 firstname.lastname@example.org