Indian Institute of Information Technology, Allahabad
Department of Information Technology
Course Syllabus
1. Name of the Course: Big Data Analytics
2. LTP structure of the course: 2-1-1
3. Objective of the course: This course covers the concept of big data analytics, algorithms, applications and frameworks.
4. Outcome of the course: Students will do the detailed study of big data analytics and able to apply in practical problems.
5. Course Plan:
Component | Unit | Topics for Coverage | |
Component 1 | Unit 1 | Unit 1 | Introduction to Big Data and its importance, 3 Vs and more, Big data analytics, Big data applications. Hadoop & Hadoop EcoSystem, Moving Data in and out of Hadoop, Inputs and outputs of MapReduce, Hadoop Architecture, HDFS, Common Hadoop Shell commands, NameNode, Secondary NameNode, and DataNode, |
Unit 2 | Unit 2 | Hadoop MapReduce paradigm, Map and Reduce tasks, Job, Task trackers , Algorithms using map reduce, Examples of Map Reduce (Word count problem, Matrix-Vector Multiplication), YARN & Zookeeper, Hadoop Cluster Setup & Hadoop Configuration, HDFS Administration: Monitoring & Maintenance | |
Component 2 | Unit 3 | Unit 3 | Hive Architecture, Comparison with Traditional Database, HiveQL - Querying Data - Sorting And Aggregating, Map Reduce Scripts, Joins & Subqueries, HBase concepts, Advanced Usage, Schema Design & Indexing - PIG, Zookeeper |
Unit 4 | Unit 4 | Spark: RDD's in Spark, Data Frames & Spark SQL, Spark Streaming, , MongoDB, NoSQL |
6. Text Book:
1. Chris Eaton, Dirk Deroos et al. , “Understanding Big data ”, McGraw Hill, 2012.
2. Boris lublinsky, Kevin t. Smith, Alexey Yakubovich, “Professional Hadoop Solutions”, Wiley, ISBN: 9788126551071, 2015.
3. Tom White, “HADOOP: The definitive Guide”, O Reilly 2012.
4. Aven Jeffrey, Data Analytics with Spark Using Python | Big Data | First Edition | Pearson Paperback, November 2018