HADOOP

Course Description

Trainer's Profile

Professionally qualified with more than 12 Years of experience and an extensive BigData experience leading diversified teams spread across different geographical locations.

Training Approach:

The training approach in considereing the following.

1. Deep explination of Concept to lay strong foundation.
2. Application of concept to a close real time environment with examples of real time use cases.
3. Explination of all the possible certification and near possible interview questions.

HADOOP Course Contents

Big Data & Hadoop:

1. What is Big Data
2. Sources of Big Data
3. IBM Definition for Big Data
4. Definition of Hadoop
5. History of Hadoop
6. Features of Hadoop
7. Hadoop Eco-System
8. Other Hadoop related products of Apache.

Hadoop Distributed File System:

1. Distributed File System
2. Definition of HDFS
3. Where not to use HDFS
4. HDFS Concepts
5. Hadoop Architecture
6. NameNode, DataNode & SNN
7. HDFS Federation
8. HDFS High Availability
9. Hadoop IO Operations(Read & Write)
10. HDFS Rack Awareness
11. Hadoop Modes
12. Hadoop Configuration
13. Linux & Hadoop Commands

Java:

1. OOP and Java
2. Object Oriented Concepts
3. Language Fundamentals
4. Inheritance
5. Polymorphism
6. Interface
7. Collections
8. Exceptions
9. Multi Threadings

MapReduce:

1. What is MapReduce & Key Value Concepts
2. Traditional Solution
3. MapReduce Solution
4. Input & Output of M/R
5. MapReduce Phases
6. Anatomy of MapReduce
7. WordCount FlowChart
8. Advantages of MapReduce
9. Input Split in M/R
10. Box Classes in Hadoop
11. Execution of WordCount Program
12. Combiner
13. Partitioner
14. MapReduce Joins
15. Distributed Cache
16. Counters
17. MapReduce Formats(Input & Output)

YARN:

1. Challenges in Hadoop 1.x
2. Hadoop 2.x Features
3. Apache YARN
4. Hadoop 2.x Eco-system
5. Hadoop 2.x High Availability
6. Anatomy of YARN Application Run
7. Run a MapReduce application on YARN

Hive:

1. Applications of Hive
2. Advantages & Disadvantages of Hive
3. Hive Installation & Invoking
4. Hive Metastore
5. Hive Architecture
6. Hive Concepts
7. Hive Data Types
8. Demonstration of DataBase Commands
9. Hive Tables
10. Demonstration of Create, rename, alter & Drop
11. Partitions in Hive
12. Bucketing in Hive
13. Hive Joins
14. Complex Data Types
15. Demonstration of External Table
16. SubQueries
17. Views
18. User Defined Functions (UDFs)

PIG:

1. Need for PIG
2. PIG versus MapReduce
3. Where to use PIG
4. Where NOT to use PIG
5. What is PIG
6. Applications of PIG
7. PIG Installation
8. Execution Types
9. Running PIG programs
10. PIG data types
11. RDBMS Vs Pig
12. Comments in Pig
13. Case Sensitivity in Pig
14. Logical and Physical Plan
15. Pig Operators
16. Pig Built in Functions
17. Diagnostic Operators in PIG
18. Special Joins in PIG:
19. Parameter Substitution in PIG
20. PIG UDFs
21. Pig Best Practices

HBASE:

1. What is HBASE & NOSql
2. History
3. Installation
4. Invoke HBase
5. HBASE Vs RDBMS
6. Uses of HBase
7. Where Not to Use HBase
8. HBase Write Path
9. HBase Read Path
10. HBase Terminology
11. Row Vs Column Oriented DB
12. HBase Architecture
13. Data loading Techniques in HBase
14. HBase Shell Commands
15. Demonstration of HBase shell Commands

Sqoop:

1. Introduction and Installation
2. Sqoop Tools
3. Sqoop Connectors
4. Creating a DB and table in MySql
5. Loading the MySql DB
6. Sqoop Import Process
7. Import Hive Data
8. Sqoop Export Process

Flume:

1. Introduction
2. Applications of Flume
3. Advantages of Flume
4. Features of Flume
5. Data Transfer in Hadoop
6. Apache Flume - Architecture
7. Components of Flume
8. Installation of Flume
9. Fetching Data using Flume.

Spark:

1. MapRedue Vs Spark
2. Apache Spark - By Definition
3. Features of Spark
4. Spark Deployment
5. Spark Core & Components
6. Spark Context & Invoking
7. Prerequisites for Spark
8. Resilient Distributed Datasets (RDDs)
9. RDD Operations
10. RDD Persistence
11. Lazy Evaluation & Lineage Graph
12. Spark SQL
13. Spark SQL Capabilities
14. SchemaRDD, DataFrame & Datasets
15. Linking with SparkSQL
16. Initializing Spark SQL
17. SQL Method
18. Creating a DataFrame
19. Transformations, Actions, Laziness
20. Spark Streaming
21. Input Sources and Batches of Input
22. Discretized Streams
23. Initializing StreamingContext
24. Transformation on DStreams
25. Output Operations on DStreams
26. Machine Learning
27. Spark MLlib
28. Sparks MLlib Packages
29. MLlib Data Types
30. Spark MLlib Algorithms
31. Spark GraphX
32. Graph - Getting Started
33. Graph Operators
34. Graph Builder
35. Graph Algorithms

Scala:

1. Why Scala
2. Functional programming & First Scala Program
3. Data Types
4. Variable
5. Classes
6. Objects
7. Access Modifies
8. Operators, Conditional Statement
9. Functions
10. Closures

About Instructor

KudVenkat

Software Architect, Trainer, Author and Speaker in Pragim Technologies.