spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: Streaming: which code is (not) executed at every batch interval?
Date Tue, 04 Nov 2014 19:36:16 GMT
Yes, code is just local Scala code unless it's invoking Spark APIs.
The "non-Spark-streaming" block appears to just be normal program code
executed in your driver, which ultimately starts the streaming
machinery later. It executes once; there is nothing about that code
connected to Spark. It's not magic.

To execute code against every RDD you use operations like foreachRDD
on DStream to write a function that is executed at each batch interval
on an RDD.

On Tue, Nov 4, 2014 at 5:43 PM, spr <spr@yarcdata.com> wrote:
> The use case I'm working on has a main data stream in which a human needs to
> modify what to look for.  I'm thinking to implement the main data stream
> with Spark Streaming and the things to look for with Spark.
> (Better approaches welcome.)
>
> To do this, I have intermixed Spark and Spark Streaming code, and it appears
> that the Spark code is not being executed every batch interval.  With
> details elided, it looks like
>
>     val sc = new SparkContext(conf)
>     val ssc = new StreamingContext(conf, Seconds(10))
>     ssc.checkpoint(".")
>
>     var lines = ssc.textFileStream(dirArg)                         //
> ====Spark Streaming code
>     var linesArray = lines.map( line => (line.split("\t")))
>
>     val whiteFd = (new java.io.File(whiteArg))                  //
> ====non-Spark-Streaming code
>     if (whiteFd.lastModified > System.currentTimeMillis-(timeSliceArg*1000))
> {
>       // read the file into a var
>
>
> //   ====Spark Streaming code
>     var SvrCum = newState.updateStateByKey[(Int, Time, Time)](updateMyState)
>
> It appears the non-Spark-Streaming code gets executed once at program
> initiation but not repeatedly. So, two questions:
>
> 1)  Is it correct that Spark code does not get executed per batch interval?
>
> 2)  Is there a definition somewhere of what code will and will not get
> executed per batch interval?  (I didn't find it in either the Spark or Spark
> Streaming programming guides.)
>
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Streaming-which-code-is-not-executed-at-every-batch-interval-tp18071.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message