End-to-end scientific workflows running in leadership class systems present significant data management challenges due to the increasing volume of data being produced. Furthermore, the impact of emerging storage architectures (e.g., deep memory hierarchies and burst buffers) and the extreme heterogeneity of the system are bringing new data management challenges. Together these data-related challenges are significantly impacting the effective execution of coupled simulations and in-situ workflows on these systems. Increasing systems scales are also expected to result in an increase in node failures and silent data corruptions, which adds to these challenges. Data staging techniques are being used to address these data-related challenges and support extreme scale in-situ workflows.
In this paper, we investigate how data staging solutions can leverage deep memory hierarchy via intelligent prefetching and data movement and efficient data placement techniques. Specifically, we present an autonomic data-management framework that leverages system information, data locality, machine learning based approaches, and user hints to capture the data access and movement patterns between components of staging-based in-situ application workflows. It then uses this knowledge to build a more robust data staging platform, which can provide high performance and resilient/error-free data exchange for in-situ workflows. We also present an overview of various data management techniques used by the DataSpaces data staging service that leverage autonomic data management to deliver the right data at the right time to the right application.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com