Jump to content

Hadoop Training At Vsbtech


vmanubolu

Recommended Posts

HADOOP AT VSBTECH

 

 

 

HADOOP BASICS

·Problems with traditional large-scale systems

·Data Storage literature survey

·Data Processing literature Survey

·Network Constraints

· Requirements for a new approach

 

Hadoop: Basic Concepts

·What is Hadoop.

·The Hadoop Distributed File System

·Hadoop Map Reduce Works

·Anatomy of a Hadoop Cluster

·Master Daemons

·Name node

·Job Tracker

·Secondary name node

·Slave Daemons

·Job tracker

·Task tracker

 

HDFS(Hadoop Distributed File System)

·Blocks and Splits

·Input Splits

·HDFS Splits

·Data Replication

·Hadoop Rack Aware

·Data high availability

·Cluster architecture and block placement

CASE STUDIES

 

Programming Practices & Performance Tuning

·Developing MapReduce Programs in

·Local Mode

·Running without HDFS

·Pseudo-distributed Mode

·Running all daemons in a single node

·Fully distributed mode

·Running daemons on dedicated nodes

·INSTALLING APACHE SINGLE NODE CLUSTER

·Name Node in Safe mode


 

Writing a MapReduce Program

·Examining a Sample MapReduce Program

·With several examples

·Basic API Concepts

·The Driver Code

·The Mapper

·The Reducer

·Hadoop's Streaming API

 

Performing several Hadoop jobs

·The configure and close Methods

·Sequence Files

·Record Reader

·Record Writer

·Role of Reporter

·Output Collector

·Counters

·Directly Accessing HDFS

·ToolRunner

·Using The Distributed Cache

·Killing a job

 

Several MapReduce jobs (In Detailed)

·MOST EFFECTIVE SEARCH USING MAPREDUCE

·GENERATING THE RECOMMENDATIONS USING MAPREDUCE

·PROCESSING THE LOG FILES USING MAPREDUCE

·IMAGE COUNTERS IN MAPREDUCE

·MRUNIT TESTING

·Identity Mapper

·Identity Reducer

·Exploring well known problems using MapReduce applications

 

Debugging MapReduce Programs

·Testing with MRUnit

·Logging

·Other Debugging Strategies.

 

Advanced MapReduce Programming

·The Secondary Sort

·Customized Input Formats and Output Formats

·Joins in MapReduce

·Compressions

 

Monitoring and debugging on a Production Cluster

·Skipping Bad Records

·Running in local mode

 

Tuning for Performance in MapReduce

·Reducing network traffic with combiner

·Partitioners


·Reducing the amount of input data

·Speculative execution

·Other Performance Aspects

CASE STUDIES

 

 

CDH4 Enhancements

·Name Node High – Availability

·Name Node federation

·Fencing

·MapReduce Version - 2

 

HIVE

·Hive concepts

·Hive architecture

·Install and configure hive on cluster

·Different type of tables in hive

·Hive library functions

·Buckets

·Partitions

·Joins in hive

·Inner joins

·Outer Joins

·Hive UDF

·Hive Serde

·Processing JSON in hive

·Compressions in Hive

 

PIG

·Pig basics

·Install and configure PIG on a cluster

·PIG Library functions

·Pig Vs Hive

·Write sample Pig Latin scripts

·Modes of running PIG

·Running in Grunt shell

·Designing Pig Scripts

·Using PiggyBank

·Running as Java program

·PIG UDFs

·Pig Macros

·Debugging PIG

 

IMPALA

·Difference between Impala Hive and Pig

·How Impala gives good performance

·Exclusive features of Impala

·Impala Challenges

·Use cases of Impala

 

SQOOP


·Install and configure Sqoop on cluster

·Connecting to RDBMS

·Installing Mysql

·Import data from Oracle/Mysql to hive

·Export data to Oracle/Mysql

·Internal mechanism of import/export

 

 

FLUME

·Architecture

·Ingesting Streaming tweets

·HDFS as Sink

 

 

NOSQL

HBase

·HBase concepts

·HBase architecture

·Region server architecture

·File storage architecture

·HBase basics

·Column access

·Scans

·HBase use cases

·Install and configure HBase on a multi node cluster

·Create database, Develop and run sample applications

 

OOZIE

·Oozie architecture

·XML file specifications

·Install and configuring Oozie and Apache

·Specifying Work flow

·Action nodes

·Control nodes

·Oozie job coordinator

 

Hadoop Challenges

·Hadoop disaster recovery

·Hadoop suitable cases

 

ELASTICSEARCH

·Get and Put API

·Java approarch

·ElasticSearch with Kibana

 

SPARK

·Basics of in memory computation

·RDD in Spark

·Installation

·Spark with Scala example


·Spark Java API

·Spark Mlib STORM BASICS KAFKA BASICS

 

 

 

 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...