See the inline response.

On Wed, Sep 24, 2014 at 4:05 PM, Reddy Raja <areddyraja@gmail.com> wrote:

Given this program.. I have the following queries..    

val sparkConf = new SparkConf().setAppName("NetworkWordCount")

    sparkConf.set("spark.master", "local[2]")

    val ssc = new StreamingContext(sparkConf, Seconds(10))

    val

​​
lines = ssc.socketTextStream(args(0), args(1).toInt, StorageLevel.MEMORY_AND_DISK_SER)

   

​​
​​
val
words = lines.flatMap(_.split(" "))

   

​​
val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)

   

​​
wordCounts.print()

    ssc.start()

    ssc.awaitTermination()


Q1) How do I know which part of the program is executing every 10 sec..

   My requirements is that, I want to execute a method and insert data into Cassandra every time a set of messages comes in

​==> Those highlighted lines will be executed in every 10 sec.​
 
​Basically whatever operations that you are doing on lines will be executed in every 10 secs, So to solve your problem you need to have a map function on the lines which will do your data insertion to Cassandra.
Eg:
​​
val dumdum = lines.map(x => { whatever you want to do with x (like insert into Cassandra) })​

Q2) Is there a function I can pass, so that, it gets executed when the next set of messages comes in.

​==> Hope the first answer covers it.​
 

Q3) If I have a method in-beween the following lines

  val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)

    wordCounts.print()

​​
    my_method(stread rdd)..

    ssc.start()


​==> No!! my_method will only execute one time​
 
​.​

   The method is not getting executed..


Can some one answer these questions?

--Reddy