spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From spr <>
Subject Re: Streaming: which code is (not) executed at every batch interval?
Date Tue, 04 Nov 2014 20:02:28 GMT
Good, thanks for the clarification.  It would be great if this were precisely
stated somewhere in the docs.  :)

To state this another way, it seems like there's no way to straddle the
streaming world and the non-streaming world;  to get input from both a
(vanilla, Linux) file and a stream.  Is that true?  

If so, it seems I need to turn my (vanilla file) data into a second stream.

sowen wrote
> Yes, code is just local Scala code unless it's invoking Spark APIs.
> The "non-Spark-streaming" block appears to just be normal program code
> executed in your driver, which ultimately starts the streaming
> machinery later. It executes once; there is nothing about that code
> connected to Spark. It's not magic.
> To execute code against every RDD you use operations like foreachRDD
> on DStream to write a function that is executed at each batch interval
> on an RDD.

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message