HADOOP

Trainer's Profile

Professionally qualified with more than 12 Years of experience and an extensive BigData experience leading diversified teams spread across different geographical locations.

Training Approach:

The training approach in considereing the following.

  • 1. Deep explination of Concept to lay strong foundation.
  • 2. Application of concept to a close real time environment with examples of real time use cases.
  • 3. Explination of all the possible certification and near possible interview questions.

HADOOP Course Contents

Big Data & Hadoop:

  • 1. What is Big Data
  • 2. Sources of Big Data
  • 3. IBM Definition for Big Data
  • 4. Definition of Hadoop
  • 5. History of Hadoop
  • 6. Features of Hadoop
  • 7. Hadoop Eco-System
  • 8. Other Hadoop related products of Apache.

Hadoop Distributed File System:

  • 1. Distributed File System
  • 2. Definition of HDFS
  • 3. Where not to use HDFS
  • 4. HDFS Concepts
  • 5. Hadoop Architecture
  • 6. NameNode, DataNode & SNN
  • 7. HDFS Federation
  • 8. HDFS High Availability
  • 9. Hadoop IO Operations(Read & Write)
  • 10. HDFS Rack Awareness
  • 11. Hadoop Modes
  • 12. Hadoop Configuration
  • 13. Linux & Hadoop Commands

Java:

  • 1. OOP and Java
  • 2. Object Oriented Concepts
  • 3. Language Fundamentals
  • 4. Inheritance
  • 5. Polymorphism
  • 6. Interface
  • 7. Collections
  • 8. Exceptions
  • 9. Multi Threadings

MapReduce:

  • 1. What is MapReduce & Key Value Concepts
  • 2. Traditional Solution
  • 3. MapReduce Solution
  • 4. Input & Output of M/R
  • 5. MapReduce Phases
  • 6. Anatomy of MapReduce
  • 7. WordCount FlowChart
  • 8. Advantages of MapReduce
  • 9. Input Split in M/R
  • 10. Box Classes in Hadoop
  • 11. Execution of WordCount Program
  • 12. Combiner
  • 13. Partitioner
  • 14. MapReduce Joins
  • 15. Distributed Cache
  • 16. Counters
  • 17. MapReduce Formats(Input & Output)

YARN:

  • 1. Challenges in Hadoop 1.x
  • 2. Hadoop 2.x Features
  • 3. Apache YARN
  • 4. Hadoop 2.x Eco-system
  • 5. Hadoop 2.x High Availability
  • 6. Anatomy of YARN Application Run
  • 7. Run a MapReduce application on YARN

Hive:

  • 1. Applications of Hive
  • 2. Advantages & Disadvantages of Hive
  • 3. Hive Installation & Invoking
  • 4. Hive Metastore
  • 5. Hive Architecture
  • 6. Hive Concepts
  • 7. Hive Data Types
  • 8. Demonstration of DataBase Commands
  • 9. Hive Tables
  • 10. Demonstration of Create, rename, alter & Drop
  • 11. Partitions in Hive
  • 12. Bucketing in Hive
  • 13. Hive Joins
  • 14. Complex Data Types
  • 15. Demonstration of External Table
  • 16. SubQueries
  • 17. Views
  • 18. User Defined Functions (UDFs)

PIG:

  • 1. Need for PIG
  • 2. PIG versus MapReduce
  • 3. Where to use PIG
  • 4. Where NOT to use PIG
  • 5. What is PIG
  • 6. Applications of PIG
  • 7. PIG Installation
  • 8. Execution Types
  • 9. Running PIG programs
  • 10. PIG data types
  • 11. RDBMS Vs Pig
  • 12. Comments in Pig
  • 13. Case Sensitivity in Pig
  • 14. Logical and Physical Plan
  • 15. Pig Operators
  • 16. Pig Built in Functions
  • 17. Diagnostic Operators in PIG
  • 18. Special Joins in PIG:
  • 19. Parameter Substitution in PIG
  • 20. PIG UDFs
  • 21. Pig Best Practices

HBASE:

  • 1. What is HBASE & NOSql
  • 2. History
  • 3. Installation
  • 4. Invoke HBase
  • 5. HBASE Vs RDBMS
  • 6. Uses of HBase
  • 7. Where Not to Use HBase
  • 8. HBase Write Path
  • 9. HBase Read Path
  • 10. HBase Terminology
  • 11. Row Vs Column Oriented DB
  • 12. HBase Architecture
  • 13. Data loading Techniques in HBase
  • 14. HBase Shell Commands
  • 15. Demonstration of HBase shell Commands

Sqoop:

  • 1. Introduction and Installation
  • 2. Sqoop Tools
  • 3. Sqoop Connectors
  • 4. Creating a DB and table in MySql
  • 5. Loading the MySql DB
  • 6. Sqoop Import Process
  • 7. Import Hive Data
  • 8. Sqoop Export Process

Flume:

  • 1. Introduction
  • 2. Applications of Flume
  • 3. Advantages of Flume
  • 4. Features of Flume
  • 5. Data Transfer in Hadoop
  • 6. Apache Flume - Architecture
  • 7. Components of Flume
  • 8. Installation of Flume
  • 9. Fetching Data using Flume.

Spark:

  • 1. MapRedue Vs Spark
  • 2. Apache Spark - By Definition
  • 3. Features of Spark
  • 4. Spark Deployment
  • 5. Spark Core & Components
  • 6. Spark Context & Invoking
  • 7. Prerequisites for Spark
  • 8. Resilient Distributed Datasets (RDDs)
  • 9. RDD Operations
  • 10. RDD Persistence
  • 11. Lazy Evaluation & Lineage Graph
  • 12. Spark SQL
  • 13. Spark SQL Capabilities
  • 14. SchemaRDD, DataFrame & Datasets
  • 15. Linking with SparkSQL
  • 16. Initializing Spark SQL
  • 17. SQL Method
  • 18. Creating a DataFrame
  • 19. Transformations, Actions, Laziness
  • 20. Spark Streaming
  • 21. Input Sources and Batches of Input
  • 22. Discretized Streams
  • 23. Initializing StreamingContext
  • 24. Transformation on DStreams
  • 25. Output Operations on DStreams
  • 26. Machine Learning
  • 27. Spark MLlib
  • 28. Sparks MLlib Packages
  • 29. MLlib Data Types
  • 30. Spark MLlib Algorithms
  • 31. Spark GraphX
  • 32. Graph - Getting Started
  • 33. Graph Operators
  • 34. Graph Builder
  • 35. Graph Algorithms

Scala:

  • 1. Why Scala
  • 2. Functional programming & First Scala Program
  • 3. Data Types
  • 4. Variable
  • 5. Classes
  • 6. Objects
  • 7. Access Modifies
  • 8. Operators, Conditional Statement
  • 9. Functions
  • 10. Closures

About Instructor

KudVenkat

Software Architect, Trainer, Author and Speaker in Pragim Technologies.

Subscribe Email Alerts

If you wish to receive email alerts when new articles, videos or interview questions are posted on PragimTech.com, you can subscribe by providing your valid email.