Dive right in with 20+ hands-on examples of analyzing large data sets with Apache Spark, on your desktop or on Hadoop! Includes 7 hours of on-demand video and a certificate of completion.
Also available at Udemy
Buy This Course
Lifetime access to all videos and materials for this course, with a one-time payment.
New! Updated for Spark 3, additional hands-on exercises, and a stronger focus on using DataFrames in place of RDD’s.
“Big data” analysis is a hot and highly valuable skill – and this course will teach you the hottest technology in big data: Apache Spark. Employers including Amazon, EBay, NASA JPL, and Yahoo all use Spark to quickly extract meaning from massive data sets across a fault-tolerant Hadoop cluster. You’ll learn those same techniques, using your own Windows system right at home. It’s easier than you might think.
Learn and master the art of framing data analysis problems as Spark problems through over 20 hands-on examples, and then scale them up to run on cloud computing services in this course. You’ll be learning from an ex-engineer and senior manager from Amazon and IMDb.
- Learn the concepts of Spark’s DataFrames and Resilient Distributed Datastores
- Develop and run Spark jobs quickly using Python
- Translate complex analysis problems into iterative or multi-stage Spark scripts
- Scale up to larger data sets using Amazon’s Elastic MapReduce service
- Understand how Hadoop YARN distributes Spark across computing clusters
- Learn about other Spark technologies, like Spark SQL, Spark Streaming, and GraphX
By the end of this course, you’ll be running code that analyzes gigabytes worth of information – in the cloud – in a matter of minutes.
This course uses the familiar Python programming language; if you’d rather use Scala to get the best performance out of Spark, see my “Apache Spark with Scala – Hands On with Big Data” course instead.
We’ll have some fun along the way. You’ll get warmed up with some simple examples of using Spark to analyze movie ratings data and text in a book. Once you’ve got the basics under your belt, we’ll move to some more complex and interesting tasks. We’ll use a million movie ratings to find movies that are similar to each other, and you might even discover some new movies you might like in the process! We’ll analyze a social graph of superheroes, and learn who the most “popular” superhero is – and develop a system to find “degrees of separation” between superheroes. Are all Marvel superheroes within a few degrees of being connected to The Incredible Hulk? You’ll find the answer.
This course is very hands-on; you’ll spend most of your time following along with the instructor as we write, analyze, and run real code together – both on your own system, and in the cloud using Amazon’s Elastic MapReduce service. 7 hours of video content is included, with over 20 real examples of increasing complexity you can build, run and study yourself. Move through them at your own pace, on your own schedule. The course wraps up with an overview of other Spark-based technologies, including Spark SQL, Spark Streaming, and GraphX.
Wrangling big data with Apache Spark is an important skill in today’s technical world. Enroll now!
James N Gershfield
Awesome course on running big data jobs on Apache Spark using Python. As usual, Frank explains things very clearly and points out various items to watch out for and make sure you have set up correctly. There are many ways that a Spark job can fail or have issues, such as running out of memory, and Frank does a great job of pointing many of those out.
HansEv
Easy steps so even a beginner should be able to install Spark and run the examples right away. Good examples and fun to do. Giving a nice set of useful examples as a toolbox.
Amiri McCain
Great course to get you going with Apache Spark and Python! Frank’s delivery is very thorough yet unpretentious; his explanations for each new concept that he introduces is down to earth and easy to follow.
Frank Kane
Author
Our courses are led by Frank Kane, a former Amazon and IMDb developer with extensive experience in machine learning and data science. With 26 issued patents and 9 years of experience at the forefront of recommendation systems, Frank brings real-world expertise to his teaching. His ability to explain complex concepts in accessible terms has helped over one million students worldwide gain valuable skills in machine learning, data engineering, and AI development.
Buy This Course
Lifetime access to all videos and materials for this course, with a one-time payment.
Getting Started with Spark
Lesson 1 of 5 within section Getting Started with Spark.
You must enroll in this course to access course content.
Lesson 2 of 5 within section Getting Started with Spark.
You must enroll in this course to access course content.
[Activity] Getting Set Up: Installing Python, a JDK, Spark, and its Dependencies.
Lesson 3 of 5 within section Getting Started with Spark.
You must enroll in this course to access course content.
[Activity] Installing the MovieLens Movie Rating Dataset
Lesson 4 of 5 within section Getting Started with Spark.
You must enroll in this course to access course content.
Spark Basics and the RDD Interface
Lesson 1 of 14 within section Spark Basics and the RDD Interface.
You must enroll in this course to access course content.
The Resilient Distributed Dataset (RDD)
Lesson 2 of 14 within section Spark Basics and the RDD Interface.
You must enroll in this course to access course content.
Ratings Histogram Walkthrough
Lesson 3 of 14 within section Spark Basics and the RDD Interface.
You must enroll in this course to access course content.
Key/Value RDD’s, and the Average Friends by Age Example
Lesson 4 of 14 within section Spark Basics and the RDD Interface.
You must enroll in this course to access course content.
[Activity] Running the Average Friends by Age Example
Lesson 5 of 14 within section Spark Basics and the RDD Interface.
You must enroll in this course to access course content.
Filtering RDD’s, and the Minimum Temperature by Location Example
Lesson 6 of 14 within section Spark Basics and the RDD Interface.
You must enroll in this course to access course content.
[Activity]Running the Minimum Temperature Example, and Modifying it for Maximums
Lesson 7 of 14 within section Spark Basics and the RDD Interface.
You must enroll in this course to access course content.
[Activity] Running the Maximum Temperature by Location Example
Lesson 8 of 14 within section Spark Basics and the RDD Interface.
You must enroll in this course to access course content.
[Activity] Counting Word Occurrences using flatmap()
Lesson 9 of 14 within section Spark Basics and the RDD Interface.
You must enroll in this course to access course content.
[Activity] Improving the Word Count Script with Regular Expressions
Lesson 10 of 14 within section Spark Basics and the RDD Interface.
You must enroll in this course to access course content.
[Activity] Sorting the Word Count Results
Lesson 11 of 14 within section Spark Basics and the RDD Interface.
You must enroll in this course to access course content.
Assignment: Tally up amount spent by customer using Spark
Lesson 12 of 14 within section Spark Basics and the RDD Interface.
You must enroll in this course to access course content.
Lesson 13 of 14 within section Spark Basics and the RDD Interface.
You must enroll in this course to access course content.
Assignment: Sort your results by amount spent per customer
Lesson 14 of 14 within section Spark Basics and the RDD Interface.
You must enroll in this course to access course content.
SparkSQL, DataFrames, and DataSets
Lesson 1 of 9 within section SparkSQL, DataFrames, and DataSets.
You must enroll in this course to access course content.
Using DataFrames instead of RDD’s
Lesson 3 of 9 within section SparkSQL, DataFrames, and DataSets.
You must enroll in this course to access course content.
[Exercise]: Implement Friends by Age with Dataframes
Lesson 4 of 9 within section SparkSQL, DataFrames, and DataSets.
You must enroll in this course to access course content.
Exercise Solution: Friends by Age, with Dataframes
Lesson 5 of 9 within section SparkSQL, DataFrames, and DataSets.
You must enroll in this course to access course content.
Word Count, with Dataframes
Lesson 6 of 9 within section SparkSQL, DataFrames, and DataSets.
You must enroll in this course to access course content.
Minimum Temperature, with Dataframes
Lesson 7 of 9 within section SparkSQL, DataFrames, and DataSets.
You must enroll in this course to access course content.
[Exercise] Implement Total Amount Spent with Dataframes
Lesson 8 of 9 within section SparkSQL, DataFrames, and DataSets.
You must enroll in this course to access course content.
Exercise Solution: Total Amount Spent with Dataframes
Lesson 9 of 9 within section SparkSQL, DataFrames, and DataSets.
You must enroll in this course to access course content.
Advanced Examples of Spark Programs
[Activity] Find the Most Popular Movie
Lesson 1 of 12 within section Advanced Examples of Spark Programs.
You must enroll in this course to access course content.
[Activity] Use Broadcast Variables to Display Movie Names Instead of ID Numbers
Lesson 2 of 12 within section Advanced Examples of Spark Programs.
You must enroll in this course to access course content.
Find the Most Popular Superhero in a Social Graph
Lesson 3 of 12 within section Advanced Examples of Spark Programs.
You must enroll in this course to access course content.
[Activity] Run the Script – Discover Who the Most Popular Superhero is!
Lesson 4 of 12 within section Advanced Examples of Spark Programs.
You must enroll in this course to access course content.
[Exercise] Find the Most Obscure Superheroes
Lesson 5 of 12 within section Advanced Examples of Spark Programs.
You must enroll in this course to access course content.
Lesson 6 of 12 within section Advanced Examples of Spark Programs.
You must enroll in this course to access course content.
Superhero Degrees of Separation: Introducing Breadth-First Search
Lesson 7 of 12 within section Advanced Examples of Spark Programs.
You must enroll in this course to access course content.
Superhero Degrees of Separation: Accumulators, and Implementing BFS in Spark
Lesson 8 of 12 within section Advanced Examples of Spark Programs.
You must enroll in this course to access course content.
[Activity] Superhero Degrees of Separation: Review the Code and Run it
Lesson 9 of 12 within section Advanced Examples of Spark Programs.
You must enroll in this course to access course content.
Item-Based Collaborative Filtering in Spark, cache(), and persist()
Lesson 10 of 12 within section Advanced Examples of Spark Programs.
You must enroll in this course to access course content.
[Exercise] Improve the Quality of Similar Movies
Lesson 12 of 12 within section Advanced Examples of Spark Programs.
You must enroll in this course to access course content.
Running Spark on a Cluster
Introducing Elastic MapReduce
Lesson 1 of 8 within section Running Spark on a Cluster.
You must enroll in this course to access course content.
[Activity] Setting up your AWS / Elastic MapReduce Account and Setting Up PuTTY
Lesson 2 of 8 within section Running Spark on a Cluster.
You must enroll in this course to access course content.
Lesson 3 of 8 within section Running Spark on a Cluster.
You must enroll in this course to access course content.
Create Similar Movies from One Million Ratings – Part 1
Lesson 4 of 8 within section Running Spark on a Cluster.
You must enroll in this course to access course content.
[Activity] Create Similar Movies from One Million Ratings – Part 2
Lesson 5 of 8 within section Running Spark on a Cluster.
You must enroll in this course to access course content.
Create Similar Movies from One Million Ratings – Part 3
Lesson 6 of 8 within section Running Spark on a Cluster.
You must enroll in this course to access course content.
Troubleshooting Spark on a Cluster
Lesson 7 of 8 within section Running Spark on a Cluster.
You must enroll in this course to access course content.
More Troubleshooting, and Managing Dependencies
Lesson 8 of 8 within section Running Spark on a Cluster.
You must enroll in this course to access course content.
Machine Learning with Spark ML
Lesson 1 of 6 within section Machine Learning with Spark ML.
You must enroll in this course to access course content.
Analyzing the ALS Recommendations Results
Lesson 3 of 6 within section Machine Learning with Spark ML.
You must enroll in this course to access course content.
[Activity] Linear Regression with Spark ML
Lesson 4 of 6 within section Machine Learning with Spark ML.
You must enroll in this course to access course content.
[Exercise] Using Decision Trees to Predict Real Estate Prices
Lesson 5 of 6 within section Machine Learning with Spark ML.
You must enroll in this course to access course content.
Lesson 6 of 6 within section Machine Learning with Spark ML.
You must enroll in this course to access course content.
Spark Streaming, Structured Streaming, and GraphX
[Activity] Structured Streaming in Python
Lesson 1 of 4 within section Spark Streaming, Structured Streaming, and GraphX.
You must enroll in this course to access course content.
[Exercise] Using Windowed Operations with Structured Streaming
Lesson 2 of 4 within section Spark Streaming, Structured Streaming, and GraphX.
You must enroll in this course to access course content.
Lesson 3 of 4 within section Spark Streaming, Structured Streaming, and GraphX.
You must enroll in this course to access course content.
Lesson 4 of 4 within section Spark Streaming, Structured Streaming, and GraphX.
You must enroll in this course to access course content.
You Made It! Where to Go from Here.
Learning More about Spark and Data Science
Lesson 1 of 2 within section You Made It! Where to Go from Here..
You must enroll in this course to access course content.
Continue your Learning Journey
Lesson 2 of 2 within section You Made It! Where to Go from Here..
You must enroll in this course to access course content.
I bought this course on Udemy. Unfortunately I can’t seem to find the link for downloading the course materials. This is impairing my learning experience as I need access to critical code materials. Please can you send me the link. Cheers!
Generally for all of our courses, there is a setup lecture early in the course that walks you through where to download any materials from. You just have to watch it.
Also, any questions about our Udemy courses should be posted in Udemy’s Q&A feature for the course.
My bad really. I have now figured out how to get the course materials. What a brilliant course indeed!
Boa Noite
Este curso possui legenda em portugues?
I think you are asking if Portuguese closed captions are available for our courses; I’m afraid they are not.
Isto mesmo , tem legenda em português ?
This course has subtitles in Portuguese?
No, it does not.
Hello,
what’s the difference between the course here and on udemy ?
It is the same course. You’re just buying direct from the instructor here.