Get in Touch

Course Outline

Introduction:

  • Apache Spark within the Hadoop Ecosystem
  • Overview of Python and Scala

Core Concepts (Theory):

  • Architecture
  • RDDs
  • Transformations and Actions
  • Stages, Tasks, and Dependencies

Hands-on Workshop: Mastering Basics in Databricks:

  • Exercises with the RDD API
  • Basic functions for actions and transformations
  • PairRDDs
  • Join operations
  • Caching strategies
  • Exercises with the DataFrame API
  • Spark SQL
  • DataFrame operations: select, filter, group, and sort
  • User-Defined Functions (UDFs)
  • Exploring the Dataset API
  • Streaming capabilities

Hands-on Workshop: Deployment in AWS:

  • Essentials of AWS Glue
  • Differences between AWS EMR and AWS Glue
  • Sample jobs in both environments
  • Evaluation of advantages and disadvantages

Additional Topics:

  • Introduction to Apache Airflow orchestration

Requirements

Programming proficiency (preferably in Python and Scala)

Basic knowledge of SQL

 21 Hours

Number of participants


Price per participant

Testimonials (3)

Upcoming Courses

Related Categories