The tremendous rise in the volume, velocity and variety of structured (e.g. RDBMS), semi structured (e.g. XML, JSON, NoSQL etc) and unstructured (Sensors, Social media, Audio, Video, Mobiles etc) data produced by different industries, governments and organizations has led to the emergence of a new paradigm called “Big Data”. This data is being collected from different sectors like open data from Governments, Healthcare, Defence, Scientific experiments (E.g. Large Hadron Collider, Sloan Digital Sky Survey, Human Genome Project, and Square Kilometer Array etc), Media (Data-journalism and mining), academia, Information Technology, Manufacturing, sports, entertainment, social media and Internet of Things.
Big data utilizes cloud computing based distributed storage technology rather than local storage due to considerations of unpredictable data size, unstructured formats and variety of other reasons. There are several big data cloud platforms currently available for storage, analysis and processing of big data like Google cloud services, AppEngine, BigQuery, Azure, S3, DynamoDB, MapReduce YARN, Apache Spark etc provisioned by tech-giants like Google, Microsoft, Amazon and Cloudera. Due to the security and privacy concerns of the data being stored, the designing of appropriate cloud computing platforms is a major challenge for the researchers. Domains like Defence and Healthcare do not share their data on cloud platforms due to the lack of legal frameworks which could ensure the ethics, quality, integrity, security and confidentiality of the data. The security and privacy issues, threats and concerns related to big data storage and analysis which need to be addressed immediately have been examined in this chapter and the implications of these threats have also been investigated with live examples of security breaches and concerns worldwide. This chapter discusses various application domains and related security threats that need to be addressed. The chapter also discusses several Big Data security solutions available in literature which address these issues and safeguard the data by ensuring privacy and encryption like Expectation-Maximization algorithm, Portable Data Binding, Privacy preserving cost-reducing heuristic algorithm etc. The Big Data Security Analytics tools have also been discussed according to five essential factors. Finally, the future prospects and the research directions which can contribute in addressing the big data security issues have been concluded.