spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrian Mocanu <amoc...@verticalscope.com>
Subject RE: another updateStateByKey question
Date Fri, 02 May 2014 19:20:38 GMT
Unfortunately, I’ve been able to have this happen only once: the first time I ran my test.
Consecutive tests never showed this again.
I will test some more and If it happens I will try to get more details.

Thanks!
-A

From: Tathagata Das [mailto:tathagata.das1565@gmail.com]
Sent: May-02-14 3:10 PM
To: user@spark.apache.org
Cc: user@spark.incubator.apache.org
Subject: Re: another updateStateByKey question


Could be a bug. Can you share a code with data that I can use to reproduce this?

TD
On May 2, 2014 9:49 AM, "Adrian Mocanu" <amocanu@verticalscope.com<mailto:amocanu@verticalscope.com>>
wrote:
Has anyone else noticed that sometimes the same tuple calls update state function twice?
I have 2 tuples with the same key in 1 RDD part of DStream: RDD[ (a,1), (a,2) ]
When the update function is called the first time Seq[V] has data: 1, 2 which is correct:
StateClass(3,2, ArrayBuffer(1, 2))
Then right away (in my output I see this) the same key is used and the function is called
again but this time Seq is empty: StateClass(3,2, ArrayBuffer( ))

In the update function I also save Seq[V] to state so I can see it in the RDD. I also show
a count and sum of the values.
StateClass(sum, count, Seq[V])

Why is the update function called with empty Seq[V] on the same key when all values for that
key have been already taken care of in a previous update?

-Adrian

Mime
View raw message