End-to-end scientific workflows running in leadership class systems present significant data management challenges due to the increasing volume of data being produced. Furthermore, the impact of emerging storage architectures (e.g., deep memory hierarchies and burst buffers) and the extreme heterogeneity of the system are bringing new data management challenges. Together these data-related challenges are significantly impacting the effective execution of coupled simulations and in-situ workflows on these systems. Increasing systems scales are also expected to result in an increase in node failures and silent data corruptions, which adds to these challenges. Data staging techniques are being used to address these data-related challenges and support extreme scale in-situ workflows.
In this paper, we investigate how data staging solutions can leverage deep memory hierarchy via intelligent prefetching and data movement and efficient data placement techniques. Specifically, we present an autonomic data-management framework that leverages system information, data locality, machine learning based approaches, and user hints to capture the data access and movement patterns between components of staging-based in-situ application workflows. It then uses this knowledge to build a more robust data staging platform, which can provide high performance and resilient/error-free data exchange for in-situ workflows. We also present an overview of various data management techniques used by the DataSpaces data staging service that leverage autonomic data management to deliver the right data at the right time to the right application.