spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From YaoPau <>
Subject Read from file and broadcast before every Spark Streaming bucket?
Date Fri, 30 Jan 2015 04:05:38 GMT
I'm creating a real-time visualization of counts of ads shown on my website,
using that data pushed through by Spark Streaming.

To avoid clutter, it only looks good to show 4 or 5 lines on my
visualization at once (corresponding to 4 or 5 different ads), but there are
50+ different ads that show on my site.

What I'd like to do is quickly change which ads to pump through Spark
Streaming, without having to rebuild the .jar and push it to my edge node. 
Ideally I'd have a .csv file on my edge node with a list of 4 ad names, and
every time a StreamRDD is created it reads from that tiny file, creates a
broadcast variable, and uses that variable as a filter.  That way I could
just open up the .csv file, save it, and the stream filters correctly

I keep getting errors when I try this.  Has anyone had success with a
broadcast variable that updates with each new streamRDD?

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message