spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From spr <...@yarcdata.com>
Subject Re: Streaming: which code is (not) executed at every batch interval?
Date Tue, 04 Nov 2014 20:02:28 GMT
Good, thanks for the clarification.  It would be great if this were precisely
stated somewhere in the docs.  :)

To state this another way, it seems like there's no way to straddle the
streaming world and the non-streaming world;  to get input from both a
(vanilla, Linux) file and a stream.  Is that true?  

If so, it seems I need to turn my (vanilla file) data into a second stream.



sowen wrote
> Yes, code is just local Scala code unless it's invoking Spark APIs.
> The "non-Spark-streaming" block appears to just be normal program code
> executed in your driver, which ultimately starts the streaming
> machinery later. It executes once; there is nothing about that code
> connected to Spark. It's not magic.
> 
> To execute code against every RDD you use operations like foreachRDD
> on DStream to write a function that is executed at each batch interval
> on an RDD.





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Streaming-which-code-is-not-executed-at-every-batch-interval-tp18071p18087.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message