spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shixiong Zhu <zsxw...@gmail.com>
Subject Re: Get only updated RDDs from or after updateStateBykey
Date Thu, 24 Sep 2015 08:27:20 GMT
For data that are not updated, where do you save? Or do you only want to
avoid accessing database for those that are not updated?

Besides,  the community is working on optimizing "updateStateBykey"'s
performance. Hope it will be delivered soon.

Best Regards,
Shixiong Zhu

2015-09-24 13:45 GMT+08:00 Bin Wang <wbin00@gmail.com>:

> I've read the source code and it seems to be impossible, but I'd like to
> confirm it.
>
> It is a very useful feature. For example, I need to store the state of
> DStream into my database, in order to recovery them from next redeploy. But
> I only need to save the updated ones. Save all keys into database is a lot
> of waste.
>
> Through the source code, I think it could be add easily: StateDStream can
> get prevStateRDD so that it can make a diff. Is there any chance to add
> this as an API of StateDStream? If so, I can work on this feature.
>
> If not possible, is there any work around or hack to do this by myself?
>

Mime
View raw message