Spark and Scala Course Content | Spark and Scala online training

July 24, 2018

Spark & Scala Course Contents

Describe Features of Apache Spark

· How Spark fits in Big Data ecosystem

· Why Spark & Hadoop fit together

Define Spark Components

· Driver Program

§ Spark Context

· Cluster Manager

· Worker

§ Executor

§ Task

· Spark RDD

§ Spark Context

· Spark Libraries

Load data into Spark

· Different data sources and formats

§ HDFS

§ Amazon S3

§ Local File System

§ Text

§ JSON

§ CSV

§ Sequence File

· Create & Use RDD, Data Frames

Apply dataset operations to Resilient Distributed Datasets

· Transformation

· Actions

· Cache Intermediate RDD

§ Lineage Graph

§ Lazy Evaluation

Use Spark DataFrames for simple queries

· Create Data Frame

· Spark Interactive shell (Scala & Python)

· Spark SQL

Define different ways to run your application

Build and launch a standalone application

· Spark Program Life Cycle

· Function of Spark Context

· Different Way to Launch Spark Application

§ Local

§ Standalone

§ Hadoop YARN

§ Apache Mesos

· Launch Spark Application

§ Spark-Submit

§ Monitor the Spark Job

Describe & Create pair RDD

· Key-Value pair

· Apache Spark vs Apache Hadoop MapReduce

· Create RDD from existing non-pair RDD

· Create pair RDD by loading certain formats

· Create pair RDD from in-memory collection of pairs

Apply Operations on pair RDD

· Group ByKey

· Reduce ByKey

· Other Transformations

§ Joins

Control partitioning across nodes

· RDD Partition

· Types of Partition

§ Hash Partitioning

§ Range Partitioning

· Benefit of Partitioning

· Best Practices

More on Data Frames

· Explore Data in DataFrames

· Create UDFs (user define functions)

§ UDF with Scala DSL

§ UDF with SQL

· Repartition Data Frames.

· Infer Schema by Reflection

· DataFrame from database table

· DataFrame from JSON

Monitor Apache Spark Applications

· Spark Execution Model

· Debug and Tune Spark Applications

Identify Spark Unified Stack Components

· Spark SQL

· Spark Streaming

· Spark MLib

· Spark GraphX

Benefits of Apache Spark over Hadoop Ecosystem

Describe Spark Data pipeline Use Cases

· Spark Streaming Architecture

· Dstream and a spark streaming application

§ Define Use Case (Time Series Data)

§ Basic Steps

§ Save Data to HBase

· Operations on DStream

§ Transformations

§ Data Frame and SQL Operations

· Define Windowed Operation

§ Sliding Window

§ Windowed Computation

§ Window based Transformation

§ Window Operations

· Fault tolerance of streaming applications

§ Fault Tolerance in Spark Streaming

§ Fault Tolerance in Spark RDD

§ Check pointing

Describe Graph X

Define Regular, Directed, and property graphs

Create a Property Graph

Perform Operations on Graphs

Describe Apache Spark MLib

Describe the Machine Learning Techniques

· Classifications

· Clustering

· Collaborative Filtering

Use Collaborative filtering to predict user choice

Scala

· Introduction

· A first example

· Expressions and Simple Functions

· First Class function

· Classes and Objects

· Case classes and Pattern matching

· Generic types and methods

· Lists

· For- Comprehension

· Mutable State

· Computing with Streams

· Lazy Values

· Implicit Parameters and Conversions

· Handley / Milner type Interface

· Abstraction for concurrency

Contact details: +1 416-834-6577 / +1 201-905-1656

WhatsApp : 9030990003/9000444287

Mail : [email protected]/[email protected]

July 24, 2018

self paced ga andaru ikkada nerchesukuntaru join avvadam koncham kashtame..good luck *n$

Sign In

Spark and Scala Course Content | Spark and Scala online training

Recommended Posts

selfpacedtech123

Amrita

Join the conversation

Tell a friend

Most viewed in last 30 days

Activity