spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil Das <ak...@sigmoidanalytics.com>
Subject Re: Spark Streaming
Date Wed, 24 Sep 2014 14:43:10 GMT
See the inline response.

On Wed, Sep 24, 2014 at 4:05 PM, Reddy Raja <areddyraja@gmail.com> wrote:

> Given this program.. I have the following queries..
>
> val sparkConf = new SparkConf().setAppName("NetworkWordCount")
>
>     sparkConf.set("spark.master", "local[2]")
>
>     val ssc = new StreamingContext(sparkConf, Seconds(10))
>
>     val
> *​​*
> *lines* = ssc.socketTextStream(args(0), args(1).toInt, StorageLevel.
> MEMORY_AND_DISK_SER)
>
>
> *​​*
> *​​val words = lines.flatMap(_.split(" "))*
>
>
> *​​*
> * val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)*
>
>
> *​​*
> *wordCounts.print()*
>
>     ssc.start()
>
>     ssc.awaitTermination()
>
>
> Q1) How do I know which part of the program is executing every 10 sec..
>
>    My requirements is that, I want to execute a method and insert data
> into Cassandra every time a set of messages comes in
>
​==> Those highlighted lines will be executed in every 10 sec.​

​Basically whatever operations that you are doing on *lines* will be
executed in every 10 secs, So to solve your problem you need to have a map
function on the lines which will do your data insertion to Cassandra.
Eg:

> *​​val dumdum = lines.map(x => { whatever you want to do with x (like
> insert into Cassandra) })​*

Q2) Is there a function I can pass, so that, it gets executed when the next
> set of messages comes in.
>
​==> Hope the first answer covers it.​


> Q3) If I have a method in-beween the following lines
>
>   val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)
>
>     wordCounts.print()
>
> *​​*
> *    my_method(stread rdd)..*
>
>     ssc.start()
>
>
> ​==> No!! my_method will only execute one time​

​.​
​

>    The method is not getting executed..
>
>
> Can some one answer these questions?
>
> --Reddy
>

Mime
View raw message