spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <tathagata.das1...@gmail.com>
Subject Re: another updateStateByKey question
Date Fri, 02 May 2014 19:09:41 GMT
Could be a bug. Can you share a code with data that I can use to reproduce
this?

TD
On May 2, 2014 9:49 AM, "Adrian Mocanu" <amocanu@verticalscope.com> wrote:

>  Has anyone else noticed that *sometimes* the same tuple calls update
> state function twice?
>
> I have 2 tuples with the same key in 1 RDD part of DStream: RDD[ (a,1),
> (a,2) ]
>
> When the update function is called the first time Seq[V] has data: 1, 2
> which is correct: StateClass(3,2, ArrayBuffer(1, 2))
>
> Then right away (in my output I see this) the same key is used and the
> function is called again but this time Seq is empty: StateClass(3,2,
> ArrayBuffer( ))
>
>
>
> In the update function I also save Seq[V] to state so I can see it in the
> RDD. I also show a count and sum of the values.
>
> StateClass(sum, count, Seq[V])
>
>
>
> Why is the update function called with empty Seq[V] on the same key when
> all values for that key have been already taken care of in a previous
> update?
>
>
>
> -Adrian
>
>
>

Mime
View raw message