Machine Learning and Big Data Training with Frank Kane
  • Courses
  • Books
  • Blog
  • My Courses / Log In
  • Community
    • Timeline
  • Help
  • Cart
Menu
  • Courses
  • Books
  • Blog
  • My Courses / Log In
  • Community
    • Timeline
  • Help
  • Cart

Overview of the Hadoop Ecosystem

  • Overview of the Hadoop Ecosystem

Overview of the Hadoop Ecosystem

  • November 3, 2019
  • 0

Back to: The Ultimate Hands-On Hadoop: Tame your Big Data!

Previous Lesson
Hadoop Overview and History
Next Lesson
HDFS: What it is, and how it works
Share this

Leave a Comment Cancel Comment

You must be logged in to post a comment.

Lessons

  • Learn all the buzzwords! And install the Hortonworks Data Platform Sandbox.
    • If you have trouble downloading Hortonworks Data Platform…
    • Installing Hadoop
    • Hadoop Overview and History
    • Overview of the Hadoop Ecosystem
  • Using Hadoop’s Core: HDFS and MapReduce
    • HDFS: What it is, and how it works
    • Installing the MovieLens Dataset
    • [Activity] Install the MovieLens dataset into HDFS using the command line
    • MapReduce: What it is, and how it works
    • How MapReduce distributes processing
    • MapReduce example: Break down movie ratings by rating score
    • [Activity] Installing Python, MRJob, and nano
    • [Activity] Code up the ratings histogram MapReduce job and run it
    • [Exercise] Rank movies by their popularity
    • [Activity] Check your results against mine!
  • Programming Hadoop with Pig
    • Introducing Ambari
    • Introducing Pig
    • Example: Find the oldest movie with a 5-star rating using Pig
    • [Activity] Find old 5-star movies with Pig
    • More Pig Latin
    • [Exercise] Find the most-rated one-star movie
    • Pig Challenge: Compare Your Results to Mine!
  • Programming Hadoop with Spark
    • Why Spark?
    • The Resilient Distributed Dataset (RDD)
    • [Activity] Find the movie with the lowest average rating – with RDD’s
    • Datasets and Spark 2.0
    • [Activity] Find the movie with the lowest average rating – with DataFrames
    • [Activity] Movie recommendations with MLLib
    • [Exercise] Filter the lowest-rated movies by number of ratings
    • [Activity] Check your results against mine!
  • Using relational data stores with Hadoop
    • What is Hive?
    • [Activity] Use Hive to find the most popular movie
    • How Hive works
    • [Exercise] Use Hive to find the movie with the highest average rating
    • Compare your solution to mine.
    • Integrating MySQL with Hadoop
    • [Activity] Install MySQL and import our movie data
    • [Activity] Use Sqoop to import data from MySQL to HFDS/Hive
    • [Activity] Use Sqoop to export data from Hadoop to MySQL
  • Using non-relational data stores with Hadoop
    • Why NoSQL?
    • What is HBase
    • [Activity] Import movie ratings into HBase
    • [Activity] Use HBase with Pig to import data at scale.
    • Cassandra overview
    • If you have trouble installing Cassandra…
    • [Activity] Installing Cassandra
    • [Activity] Write Spark output into Cassandra
    • MongoDB overview
    • [Activity] Install MongoDB, and integrate Spark with MongoDB
    • [Activity] Using the MongoDB shell
    • Choosing a database technology
    • [Exercise] Choose a database for a given problem
  • Querying your Data Interactively
    • Overview of Drill
    • [Activity] Setting up Drill
    • [Activity] Querying across multiple databases with Drill
    • Overview of Phoenix
    • [Activity] Install Phoenix and query HBase with it
    • [Activity] Integrate Phoenix with Pig
    • Overview of Presto
    • [Activity] Install Presto, and query Hive with it.
    • [Activity] Query both Cassandra and Hive using Presto.
  • Managing your Cluster
    • YARN explained
    • Tez explained
    • [Activity] Use Hive on Tez and measure the performance benefit
    • Mesos explained
    • ZooKeeper explained
    • [Activity] Simulating a failing master with ZooKeeper
    • Oozie explained
    • Important setup step for Oozie on HDP 2.6.5
    • [Activity] Set up a simple Oozie workflow
    • Zeppelin overview
    • [Activity] Use Zeppelin to analyze movie ratings, part 1
    • [Activity] Use Zeppelin to analyze movie ratings, part 2
    • Hue overview
    • Other technologies worth mentioning
  • Feeding Data to your Cluster
    • Kafka explained
    • [Activity] Setting up Kafka, and publishing some data.
    • [Activity] Publishing web logs with Kafka
    • Flume explained
    • [Activity] Set up Flume and publish logs with it.
    • [Activity] Set up Flume to monitor a directory and store its data in HDFS
  • Analyzing Streams of Data
    • Spark Streaming: Introduction
    • [Activity] Analyze web logs published with Flume using Spark Streaming
    • [Exercise] Monitor Flume-published logs for errors in real time
    • Exercise solution: Aggregating HTTP access codes with Spark Streaming
    • Apache Storm: Introduction
    • [Activity] Count words with Storm
    • Flink: An Overview
    • [Activity] Counting words with Flink
  • Designing Real-World Systems
    • The Best of the Rest
    • Review: How the pieces fit together
    • Understanding your requirements
    • Sample application: consume webserver logs and keep track of top-sellers
    • Sample application: serving movie recommendations to a website
    • [Exercise] Design a system to report web sessions per day
    • Exercise solution: Design a system to count daily sessions
  • Learning More
    • Books and online resources
    • Continue your Learning Journey!
https://sundog-education.com/ Website and all course content © Copyright 2020 Sundog Software LLC DBA Sundog Education. All rights reserved worldwide. "Sundog" is a registered trademark of Sundog Software, LLC. Read our privacy policy and terms of service.