spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bin Wang <>
Subject Get only updated RDDs from or after updateStateBykey
Date Thu, 24 Sep 2015 05:45:01 GMT
I've read the source code and it seems to be impossible, but I'd like to
confirm it.

It is a very useful feature. For example, I need to store the state of
DStream into my database, in order to recovery them from next redeploy. But
I only need to save the updated ones. Save all keys into database is a lot
of waste.

Through the source code, I think it could be add easily: StateDStream can
get prevStateRDD so that it can make a diff. Is there any chance to add
this as an API of StateDStream? If so, I can work on this feature.

If not possible, is there any work around or hack to do this by myself?

View raw message