Video description
You're a software developer somewhat familiar with Apache Spark and how it's used to analyze Big Data. You've been tasked with a Big Data analysis job and you want to rent space on a cluster to do it. But where to begin?
This is a hands-on course where Amazon Web Services pro Frank Kane shows you how to rent Amazon's Elastic MapReduce service (EMR) at minimal cost and use it to run Spark scripts on top of a real Hadoop cluster. Kane's approach is fun: You'll learn a Big Data analysis process by actually deploying Spark on EMR to build a working movie recommendation engine using real movie ratings data.
- Learn Amazon EMR's undocumented "gotchas", so they don't take you by surprise
- Save money on EMR costs by learning to stage scripts, data, and actions ahead of time
- Understand how to provision an EMR cluster configured for Apache Spark
- Explore two different ways to run Spark scripts on EMR
- Learn how to set up security, and monitor a Spark cluster through a web UI
- Understand how to interactively develop Spark code on EMR with Apache Zeppelin
- Gain experience with Spark and AWS - two skills that are highly valued by employers
Table of contents
-
Introduction
- Welcome To The Course 00:02:23
- About The Author 00:01:50
-
Overview Of Spark On AWS
- What Is Spark? 00:06:06
- Elastic MapReduce And Spark 00:04:42
- Setting Up An AWS Account 00:02:29
-
Preparing Your Spark Script
- Overview Of Our Spark Script 00:09:19
- Packaging Your Script With SBT 00:07:27
- Uploading To S3 00:06:41
-
Launching Your EMR Cluster
- Provisioning Your Cluster 00:05:21
- Connecting To The Master 00:04:52
- Running Your Spark Script Manually 00:03:20
- Running Your Spark Script As A Step In EMR 00:06:55
- Overriding Spark Configuration Settings 00:07:04
-
Interacting With Your EMR cluster
- Setting Up An SSH Tunnel 00:05:44
- Using Zeppelin With Spark On EMR 00:05:57
-
Conclusion
- Wrap Up and Thank You 00:02:48
Product information
- Title: Analyzing Big Data with Spark and Amazon EMR
- Author(s):
- Release date: March 2017
- Publisher(s): Infinite Skills
- ISBN: 9781491985113
You might also like
video
Analyzing Big Data with Hadoop, AWS, and EMR
Hadoop is today's most pervasive technology used in Big Data for distributing the processing of massive …
video
Mastering Big Data Analytics with PySpark
PySpark helps you perform data analysis at-scale; it enables you to build more scalable analyses and …
book
Simplify Big Data Analytics with Amazon EMR
Design scalable big data solutions using Hadoop, Spark, and AWS cloud native services Key Features Build …
video
Apache Spark with Scala – Hands-On with Big Data!
“Big data” analysis is a hot and highly valuable skill—and this course will teach you the …