spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jungtaek Lim <kabhwan.opensou...@gmail.com>
Subject Re: Arbitrary stateful aggregation: updating state without setting timeout
Date Mon, 05 Oct 2020 12:17:15 GMT
Hi,

That's not explained in the SS guide doc but explained in the scala API doc.
http://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/streaming/GroupState.html

The statement being quoted from the scala API doc answers your question.

The timeout is reset every time the function is called on a group, that is,
> when the group has new data, or the group has timed out. So the user has to
> set the timeout duration every time the function is called, otherwise there
> will not be any timeout set.


Simply saying, you'd want to always set timeout unless you remove state for
the group (key).

Hope this helps.

Thanks,
Jungtaek Lim (HeartSaVioR)

‪On Mon, Oct 5, 2020 at 6:16 PM ‫Yuri Oleynikov (יורי אולייניקוב‬‎ <
yurkao@gmail.com> wrote:‬

> Hi all, I have following question:
>
> What happens to the state (in terms of expiration) if I’m updating the
> state without setting timeout?
>
>
> E.g. in FlatMapGroupsWithStateFunction
>
>    1. first batch:
>
> state.update(myObj)
>
> state.setTimeoutDuration(timeout)
>
>    1. second batch:
>
> state.update(myObj)
>
>    1. third batch (no data for a long time):
>       1. ???? state timed-out after initial timeout  expired? Not
>       timed-out?
>
>

Mime
View raw message