Spark and Scala Online Training|Training-3092003848

June 20, 2017

21st Century Software Solutions Private Limited is the best Online Training providers worldwide with real time experts:

Course Outline:

Describe Features of Apache Spark

How Spark fits in Big Data ecosystem
Why Spark & Hadoop fit together

Define Spark Components

Driver Program
- Spark Context
Cluster Manager
Worker
- Executor
- Task
Spark RDD
- Spark Context
Spark Libraries

Load data into Spark

Different data sources and formats
- HDFS
- Amazon S3
- Local File System
- Text
- JSON
- CSV
- Sequence File

Create & Use RDD, Data Frames

Apply dataset operations to Resilient Distributed Datasets

Transformation
Actions
Cache Intermediate RDD
- Lineage Graph
- Lazy Evaluation

Use Spark DataFrames for simple queries

Create Data Frame
Spark Interactive shell (Scala & Python)
Spark SQL

Define different ways to run your application

Build and launch a standalone application

Spark Program Life Cycle
Function of Spark Context
Different Way to Launch Spark Application
- Local
- Standalone
- Hadoop YARN
- Apache Mesos

Launch Spark Application
- Spark-Submit
- Monitor the Spark Job

Describe & Create pair RDD

Key-Value pair
Apache Spark vs Apache Hadoop MapReduce
Create RDD from existing non-pair RDD
Create pair RDD by loading certain formats
Create pair RDD from in-memory collection of pairs

Apply Operations on pair RDD

Group ByKey
Reduce ByKey
Other Transformations
- Joins

Control partitioning across nodes

RDD Partition
Types of Partition
- Hash Partitioning
- Range Partitioning

Benefit of Partitioning
Best Practices

More on Data Frames

Explore Data in DataFrames
Create UDFs (user define functions)
- UDF with Scala DSL
- UDF with SQL

Repartition Data Frames.
Infer Schema by Reflection
DataFrame from database table
DataFrame from JSON

Monitor Apache Spark Applications

Spark Execution Model
Debug and Tune Spark Applications

Identify Spark Unified Stack Components

Spark SQL
Spark Streaming
Spark MLib
Spark GraphX

Benefits of Apache Spark over Hadoop Ecosystem

Describe Spark Data pipeline Use Cases

Spark Streaming Architecture
Dstream and a spark streaming application
- Define Use Case (Time Series Data)
- Basic Steps
- Save Data to HBase

Operations on DStream
- Transformations
- Data Frame and SQL Operations
Define Windowed Operation
- Sliding Window
- Windowed Computation
- Window based Transformation
- Window Operations

Fault tolerance of streaming applications
- Fault Tolerance in Spark Streaming
- Fault Tolerance in Spark RDD
- Check pointing

Describe Graph X

Define Regular, Directed, and property graphs

Create a Property Graph

Perform Operations on Graphs

Describe Apache Spark MLib

Describe the Machine Learning Techniques

Classifications
Clustering
Collaborative Filtering

Use Collaborative filtering to predict user choice

Scala

Introduction
A first example
Expressions and Simple Functions
First Class function
Classes and Objects
Case classes and Pattern matching
Generic types and methods
Lists
For- Comprehension
Mutable State
Computing with Streams
Lazy Values
Implicit Parameters and Conversions
Handley / Milner type Interface
Abstraction for concurrency

Sign In

Spark and Scala Online Training|Training-3092003848

Recommended Posts

davidjhon21

Link to comment

Share on other sites

Join the conversation

Popular Now

Tell a friend

Most viewed in last 30 days

Browse

Activity

AndhraWatch