[Activity] Stream Live Tweets with Spark Streaming!

Back to: Streaming Big Data with Spark Streaming, Scala, and Spark 3!

Previous Lesson

Introduction, and Getting Set Up

Next Lesson

[Activity] Scala Basics: Part 1

9 thoughts on “[Activity] Stream Live Tweets with Spark Streaming!”

Ahmed Khalil says:

July 15, 2020 at 10:24 pm

Hello Frank,
First, Thank you so much for the deep and insightful information throughout all courses. These are really great, especially when compared to other courses out there.

Second, I have a question regarding the solution architecture for a system I’m building now with my team. We are collecting data from various sources (mostly REST APIs) for financial data, caching these data into elastic search, then using some modern UI components building visualizations and dashboards.

My questions are regarding the tool we will use to ingest the data from the different sources, as we need to do some operations on the data while importing it to elastic, for ex:
– Transformations: splitting one record from the source system to multiple documents inelastic based on some logic
– creating aggregated documents, this is a separate scheduler that we are looking to create to create some aggregations and save these aggregations as a new document in elastic. I know that elastic aggregations are great and we will use it. these documents are just for a special purpose.

We tried logstash and it was limiting to what we are looking to do in terms of transformations. Now we are comparing Spark streaming and Kafka streaming in order to have the most flexibility possible, we have java background and open to learn any language as well.

Looking forward for your feedback and sorry for the long question.

Best,
Ahmed

Log in to Reply
Frank Kane says:

July 16, 2020 at 8:38 am

My gut reaction would be to start with Spark Streaming, if you’re looking to have the most flexibility in transforming and aggregating the data as it is ingested.

Tools such as Logstash and Kafka tend to be better suited to ingesting data produced by large numbers of individual hosts and funneling that data somewhere. Your use case is a bit different, as you’re just hitting REST API’s for your data and not trying to solve the problem of reliably transmitting data from a large number of systems to a single data repository.

Log in to Reply
Ahmed Khalil says:

July 17, 2020 at 3:33 am

Thanks a lot, Frank, this kind of the way we chose to use spark streaming. in the future when the system grows, we might add spark itself betwen spark streaming and elstic to carry the datasets as is from the source and keep the aggregate data only in elastic for optimum performance. Also later we might add kafka to handle the real-time data streams for sources like google analytics. This is getting interesting 😀 .

Log in to Reply
Dhaneshwar Jha says:

February 9, 2021 at 6:57 am

Hi Mark,
I am trying to do this project in intellij com. edition. facing java.lang error.
Cant use scala ide because that fails saying “java was started but returned exit code=1”

please help
Thanks and regards

Log in to Reply
1. Frank Kane says:
  
  February 9, 2021 at 7:38 am
  
  Sorry but I can’t provide support for IntelliJ with this course; I’d recommend installing Anaconda so you can follow along with the instructions.
  
  There isn’t enough information to go on with that error message, anyhow. If there is a stack trace or any further information that was output it might lead you toward the issue.
  
  Log in to Reply
Dhaneshwar Jha says:

February 10, 2021 at 1:34 am

Hi Frank, Thanks for your quick response. I have resolved the issue. It was due to mismatch in version of external JAR files which we have imported from the course material.

Log in to Reply
Roger Yu says:

January 28, 2023 at 5:52 pm

I’m getting the following error:
“`
——————————————-
Time: 1674946211000 ms
——————————————-

23/01/29 09:50:11 ERROR ReceiverTracker: Deregistered receiver for stream 0: Restarting receiver with delay 2000ms: Error receiving tweets – https://stream.twitter.com/1.1/statuses/sample.json?stall_warnings=true
Relevant discussions can be found on the Internet at:
http://www.google.co.jp/search?q=4d13a6ae or
http://www.google.co.jp/search?q=b7701a00
TwitterException{exceptionCode=[4d13a6ae-b7701a00 fc29eb32-369321d6 fc29eb32-369321c7], statusCode=-1, message=null, code=-1, retryAfter=-1, rateLimitStatus=null, version=4.0.4}
“`

Log in to Reply
1. Frank Kane says:
  
  January 29, 2023 at 8:12 am
  
  I’m afraid that with all the turmoil at Twitter, the API this course uses hasn’t been working. There is another API they offer, but there is no Scala client for it I can find. You’ll have to just watch the videos that involve Twitter instead of following along; what you’ll learn about Spark is still relevant, it’s just the Twitter piece that’s broken.
  
  Log in to Reply
  1. Roger Yu says:
    
    January 29, 2023 at 3:18 pm
    
    Ah I see. Thanks for the confirmation Frank. I was going out of my mind, thinking I may have missed a step.
    
    Log in to Reply

[Activity] Stream Live Tweets with Spark Streaming!

9 thoughts on “[Activity] Stream Live Tweets with Spark Streaming!”

Leave a Reply Cancel reply

(C) Copyright 2021-2025 Sundog Software LLC. All rights reserved worldwide.

Theme LaunchPad by LifterLMS