JumpStart to Developing in Spark | Spark Programs, RDDs, NoSQL, Spark Machine Learning & More

Gain a thorough understanding of developing in Spark

Course Code : 1340

$2295

Overview

This course offers a holistic overview in some of the most cutting-edge technologies in the data science spectrum, with an emphasis on Spark and related tools. The framework of this course is structured for developers interested in enhancing their skills and learning enterprise-grade Spark programming. The course covers a wide array of topics ranging from features of Spark to practical experience with the specific set of technologies.

Schedule Classes

Looking for more sessions of this class?

Course Delivery

This course is available in the following formats:

Live Classroom
Duration: 5 days

Live Virtual Classroom
Duration: 5 days

What You'll learn

  • Basics of Spark architecture and applications
  • Executing Spark Programs
  • Creating and manipulating both RDDs (Resilient Distributed Datasets) and UDFs (Unified Data Frames)
  • Restoring data frames
  • Essential NOSQL access
  • Integrating machine learning into Spark applications
  • Using Spark Streaming and Kafka to create streaming applications

 

Outline

  • Overview of Spark
  • Hadoop ecosystem
  • Hadoop YARN vs. Mesos
  • Spark vs. Map/Reduce
  • Spark: Lambda architecture
  • Spark in the enterprise data science architecture
  • Spark shell
  • RDDs: Resilient distributed datasets
  • DataFrames
  • Spark 2 unified DataFrames
  • Spark sessions
  • Functional programming
  • Spark SQL
  • MLib
  • Structured streaming
  • Spark R
  • Spark and Python
  • Exercise: Hello, Spark
  • Coding with RDDs
  • Transformations
  • Actions
  • Lazy evaluation and optimization
  • RDDs in Map/Reduce
  • Exercise: Working with RDDs
  • RDDs vs. DataFrames
  • Unified Dataframes (UDF) in Spark 2.x
  • Partitioning
  • Exercise: Working with unified DataFrames
  • RDD persistence
  • DataFrame and unified DataFrame persistence
  • Distributed persistence
  • Exercise: Saving and restoring DataFrames
  • Ingesting data
  • Relational databases and Sqoop
  • Interacting with Hive
  • Graph data
  • Accessing Cassandra data
  • Exercise: NoSQL data access
  • Spark SQL
  • SQL and DataFrames
  • Spark SQL and Hive
  • Spark SQL and JDBC
  • Exercise: Working with SparkSQL
  • ML Lib
  • Mahout
  • Exercise: Hello, MLib
  • Streaming overview
  • Streams
  • Structured streaming
  • Lambda streaming
  • Spark and Kafka
  • Exercise: Hello, Spark Streaming
View More

Prerequisites

Participants must be proficient in Java Programming Fundamentals. They need to have a thorough understanding of the basics of Python programming and SQL.

Who Should Attend

This course is geared for experienced developers and architects (with development experience) who seek to be proficient in advanced, modern development skills, working with Apache Spark in an enterprise data environment.

This course is highly recommended for:

  • Hadoop/Spark developers
  • Data scientists
  • Data engineers
  • Big Data engineers
  • Java developers
  • Application developers
  • Full stack developers

 

Interested in this course? Let’s connect!

Customer Reviews

Name
Email
Rating
Comments

No reviews yet