spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From M Singh <>
Subject Apache Spark - Structured Streaming State Management With Watermark
Date Wed, 28 Mar 2018 22:15:28 GMT
I am using Apache Spark Structured Streaming (2.2.1) to implement custom sessionization for
events.  The processing is in two steps:1. flatMapGroupsWithState (based on user id) - which
stores the state of user and emits events every minute until a expire event is received 
2. The next step is a aggregation (group by count)

I am using outputMode - Update.

I have a few questions:
1. If I don't use watermark at all -      (a) is the state for flatMapGroupsWithState
state stored forever ?      (b) is the state for groupBy count stored for ever ?2. Is
watermark applicable for cleaning up groupBy aggregates only ?3. Can we use watermark to manage
state in by flatMapGroupsWithState ? If so, how ?
4. Can watermark be used for other state clean up - are there any examples for those ?

View raw message