This paper explores how a skeleton based approach can be used to perform big data analysis. We introduce a restricted storage system based on blocks with a fixed maximum size. The storage design removes the residual data problem commonly found in storage systems, and enables processing on individual blocks. We then introduce a stream-oriented query system that can be used on top of the distributed storage system. The query system is built on a limited number of core operations. Each of the perform a specified function, such as filtering elements, but are skeleton operations where the programmer needs to fill in how to perform the operation. The operations are designed to allow splitting across the blocks in the storage system, giving concurrent execution while maintaining a completely sequential program description. To assist in understanding the data flow, we also introduce a graphical representation for each of the methods, enabling a visual expression of an algorithm. To evaluate the query system we implement a number of classic Big-Data queries and show how to implement them with code, and how the queries can be visualized with the graphical representation.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com