spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Haopu Wang" <HW...@qilinsoft.com>
Subject RE: Should I avoid "state" in an Spark application?
Date Mon, 13 Jun 2016 01:11:45 GMT
Can someone look at my questions? Thanks again!

 

________________________________

From: Haopu Wang 
Sent: 2016年6月12日 16:40
To: user@spark.apache.org
Subject: Should I avoid "state" in an Spark application?

 

I have a Spark application whose structure is below:

 

    var ts: Long = 0L

    dstream1.foreachRDD{

        (x, time) => {

            ts = time

            x.do_something()...

        }

    }

    ......

    process_data(dstream2, ts, ......)

 

I assume foreachRDD function call can update "ts" variable which is then used in the Spark
tasks of "process_data" function.

 

>From my test result of a standalone Spark cluster, it is working. But should I concern
if switch to YARN?

 

And I saw some articles are recommending to avoid state in Scala programming. Without the
state variable, how could that be done?

 

Any comments or suggestions are appreciated.

 

Thanks,

Haopu


Mime
View raw message