6 thoughts on “[Activity] Stream Live Tweets with Spark Streaming!”

  1. Ahmed Khalil says:

    Hello Frank,
    First, Thank you so much for the deep and insightful information throughout all courses. These are really great, especially when compared to other courses out there.

    Second, I have a question regarding the solution architecture for a system I’m building now with my team. We are collecting data from various sources (mostly REST APIs) for financial data, caching these data into elastic search, then using some modern UI components building visualizations and dashboards.

    My questions are regarding the tool we will use to ingest the data from the different sources, as we need to do some operations on the data while importing it to elastic, for ex:
    – Transformations: splitting one record from the source system to multiple documents inelastic based on some logic
    – creating aggregated documents, this is a separate scheduler that we are looking to create to create some aggregations and save these aggregations as a new document in elastic. I know that elastic aggregations are great and we will use it. these documents are just for a special purpose.

    We tried logstash and it was limiting to what we are looking to do in terms of transformations. Now we are comparing Spark streaming and Kafka streaming in order to have the most flexibility possible, we have java background and open to learn any language as well.

    Looking forward for your feedback and sorry for the long question.

    Best,
    Ahmed

  2. Frank Kane says:

    My gut reaction would be to start with Spark Streaming, if you’re looking to have the most flexibility in transforming and aggregating the data as it is ingested.

    Tools such as Logstash and Kafka tend to be better suited to ingesting data produced by large numbers of individual hosts and funneling that data somewhere. Your use case is a bit different, as you’re just hitting REST API’s for your data and not trying to solve the problem of reliably transmitting data from a large number of systems to a single data repository.

  3. Ahmed Khalil says:

    Thanks a lot, Frank, this kind of the way we chose to use spark streaming. in the future when the system grows, we might add spark itself betwen spark streaming and elstic to carry the datasets as is from the source and keep the aggregate data only in elastic for optimum performance. Also later we might add kafka to handle the real-time data streams for sources like google analytics. This is getting interesting 😀 .

  4. Dhaneshwar Jha says:

    Hi Mark,
    I am trying to do this project in intellij com. edition. facing java.lang error.
    Cant use scala ide because that fails saying “java was started but returned exit code=1”

    please help
    Thanks and regards

    1. Frank Kane says:

      Sorry but I can’t provide support for IntelliJ with this course; I’d recommend installing Anaconda so you can follow along with the instructions.

      There isn’t enough information to go on with that error message, anyhow. If there is a stack trace or any further information that was output it might lead you toward the issue.

  5. Dhaneshwar Jha says:

    Hi Frank, Thanks for your quick response. I have resolved the issue. It was due to mismatch in version of external JAR files which we have imported from the course material.

Leave a Reply