spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Jeffrey <bryan.jeff...@gmail.com>
Subject Data Source - State (SPARK-28190)
Date Mon, 30 Mar 2020 19:50:32 GMT
Hi, Jungtaek.

We've been investigating the use of Spark Structured Streaming to replace
our Spark Streaming operations.  We have several cases where we're using
mapWithState to maintain state across batches, often with high volumes of
data.  We took a look at the Structured Streaming stateful processing.
Structured Streaming state processing looks great, but has some
shortcomings:
1. State can only be hydrated from checkpoint, which means that
modification of the state is not possible.
2. You cannot cleanup or normalize state data after it has been processed.

These shortcomings appear to be potentially addressed by your
ticket SPARK-28190 - "Data Source - State".  I see little activity on this
ticket. Can you help me to understand where this feature currently stands?

Thank you,

Bryan Jeffrey

Mime
View raw message