Data streaming frameworks like stratosphere  are designed to work in the cloud on a large number of parallel working nodes. The increase of nodes together with the expected long run-time of data processing tasks causes an increase of failure probability. Therefore fault tolerance becomes an important issue in these systems. Existing fault tolerance strategies for data streaming systems usually accept full restarts or work in a blocking manner.
In this paper we introduce ephemeral materialization points, a non blocking materialization strategy in data streaming systems. This strategy selects materialization positions uncoordinated during run-time. The materialization decision is taken depending on the resource usage and the execution graph to minimize the expected recovery time in case of a failure. We show how and when to reach a decision whether to materialize or not, and which information could influence the decision.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com