The recent advances in Big data made attempts to analyze huge dumps of readily available transactional data to predict patterns and trends. Hadoop framework was developed based on MapReduce to exploit parallelism to the fullest. And, indeed it has enabled the computing mechanisms to be more robust, flexible, scalable and efficient. At the same time, this has unearthed many new limitations of existing databases and computational algorithms such as processing speed versus waiting times and parallelizability of a query. In this chapter, we will focus on understanding the need, features and applications of Spark SQL. It will also include Spark SQL code snippets to enhance the coding abilities of the readers.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com