spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil Das <ak...@sigmoidanalytics.com>
Subject Re: Spark Streaming - Most popular Twitter Hashtags
Date Tue, 04 Nov 2014 08:18:53 GMT
This might help
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/TwitterPopularTags.scala

Thanks
Best Regards

On Tue, Nov 4, 2014 at 6:03 AM, Harold Nguyen <harold@nexgate.com> wrote:

> Hi all,
>
> I was just reading this nice documentation here:
>
> http://ampcamp.berkeley.edu/3/exercises/realtime-processing-with-spark-streaming.html
>
> And got to the end of it, which says:
>
> "Note that there are more efficient ways to get the top 10 hashtags. For
> example, instead of sorting the entire of 5-minute-counts (thereby,
> incurring the cost of a data shuffle), one can get the top 10 hashtags in
> each partition, collect them together at the driver and then find the top
> 10 hashtags among them. We leave this as an exercise for the reader to try."
>
> I was just wondering if anyone had managed to do this, and was willing to
> share as an example :) This seems to be the exact use case that will help
> me!
>
> Thanks!
>
> Harold
>

Mime
View raw message