Spark SQL with Hive Context or SQL Context

Sunami, Karuna Yogesh; Sunami, Yogesh

doi:10.3233/978-1-61499-814-3-292

Abstract

The recent advances in Big data made attempts to analyze huge dumps of readily available transactional data to predict patterns and trends. Hadoop framework was developed based on MapReduce to exploit parallelism to the fullest. And, indeed it has enabled the computing mechanisms to be more robust, flexible, scalable and efficient. At the same time, this has unearthed many new limitations of existing databases and computational algorithms such as processing speed versus waiting times and parallelizability of a query. In this chapter, we will focus on understanding the need, features and applications of Spark SQL. It will also include Spark SQL code snippets to enhance the coding abilities of the readers.

This website uses cookies

This website uses cookies