spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From qihong <qc...@pivotal.io>
Subject Re: How to initialize StateDStream
Date Sat, 13 Sep 2014 19:17:23 GMT
I'm not sure what you mean by "previous run". Is it previous batch? or
previous run of spark-submit?

If it's "previous batch" (spark streaming creates a batch every batch
interval), then there's nothing to do.

If it's previous run of spark-submit (assuming you are able to save the
result somewhere), then I can think of two possible ways to do it:

1. read saved result as RDD (just do this once), and join the RDD with each
RDD of the stateStream. 

2. add extra logic to updateFunction: when the previous state is None (one
of two Option type values), you get save state for the given key from saved
result somehow, then your original logic to create new state object based on
Seq[V] and previous state. note that you need use this version of
updateFunction: "updateFunc: (Iterator[(K, Seq[V], Option[S])]) =>
Iterator[(K, S)]", which make key available to the update function.





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-initialize-StateDStream-tp14113p14176.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message