spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <t...@databricks.com>
Subject Re: reduceByKeyAndWindow with initial state
Date Sat, 11 Jul 2015 00:50:30 GMT
Are you talking about reduceByKeyAndWindow with or without inverse reduce?

TD

On Fri, Jul 10, 2015 at 2:07 AM, Imran Alam <imran@newscred.com> wrote:

> We have a streaming job that makes use of reduceByKeyAndWindow
> <https://github.com/apache/spark/blob/v1.4.0/streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala#L334-L341>.
> We want this to work with an initial state. The idea is to avoid losing
> state if the streaming job is restarted, also to take historical data into
> account for the windows. But reduceByKeyAndWindow doesn't accept any
> "initialRDD" parameter like updateStateByKey
> <https://github.com/apache/spark/blob/v1.4.0/streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala#L435-L445>
> does.
>
> The plan is to extend reduceByKeyAndWindow to accept an "initalRDDs"
> parameter, so that the DStream starts with those RDDs as initial value of
> "generatedRDD" rather than an empty map. But the "generatedRDD" is a
> private variable, so I'm bit confused on how to proceed with the plan.
>
>

Mime
View raw message