Indian Institute of Information Technology, Allahabad

Department of Information Technology

Course Syllabus

1. Name of the Course: Big Data Analytics

2. LTP structure of the course: 2-1-1

3. Objective of the course: This course covers the concept of big data analytics, algorithms, applications and frameworks.

4. Outcome of the course: Students will do the detailed study of big data analytics and able to apply in practical problems.

5. Course Plan:

Component

Unit

Topics for Coverage  

Component 1

Unit 1

Unit 1

Introduction to Big Data and its importance, 3 Vs and more, Big data analytics, Big data applications. Hadoop & Hadoop EcoSystem, Moving Data in and out of Hadoop, Inputs and outputs of MapReduce, Hadoop Architecture, HDFS, Common Hadoop Shell commands, NameNode, Secondary NameNode, and DataNode,  

Unit 2

Unit 2

Hadoop MapReduce paradigm, Map and Reduce tasks, Job, Task trackers , Algorithms using map reduce, Examples of Map Reduce (Word count problem, Matrix-Vector Multiplication), YARN & Zookeeper, Hadoop Cluster Setup & Hadoop Configuration, HDFS Administration: Monitoring & Maintenance

Component 2

Unit 3

Unit 3

Hive Architecture, Comparison with Traditional Database, HiveQL - Querying Data - Sorting And Aggregating, Map Reduce Scripts, Joins & Subqueries, HBase concepts, Advanced Usage, Schema Design & Indexing - PIG, Zookeeper

Unit 4

Unit 4

Spark: RDD's in Spark, Data Frames & Spark SQL, Spark Streaming, , MongoDB, NoSQL

6. Text Book:

1. Chris Eaton, Dirk Deroos et al. , “Understanding Big data ”, McGraw Hill, 2012.

2. Boris lublinsky, Kevin t. Smith, Alexey Yakubovich, “Professional Hadoop Solutions”, Wiley, ISBN: 9788126551071, 2015.

3. Tom White, “HADOOP: The definitive Guide”, O Reilly 2012.

4. Aven Jeffrey, Data Analytics with Spark Using Python | Big Data | First Edition | Pearson Paperback,  November 2018