spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ayan guha <guha.a...@gmail.com>
Subject Re: [structured streaming] How to remove outdated data when use Window Operations
Date Thu, 01 Dec 2016 20:57:17 GMT
Thanks TD. Will it be available in pyspark too?
On 1 Dec 2016 19:55, "Tathagata Das" <tathagata.das1565@gmail.com> wrote:

> In the meantime, if you are interested, you can read the design doc in the
> corresponding JIRA - https://issues.apache.org/jira/browse/SPARK-18124
>
> On Thu, Dec 1, 2016 at 12:53 AM, Tathagata Das <
> tathagata.das1565@gmail.com> wrote:
>
>> That feature is coming in 2.1.0. We have added watermarking, that will
>> track the event time of the data and accordingly close old windows, output
>> its corresponding aggregate and then drop its corresponding state. But in
>> that case, you will have to use append mode, and aggregated data of a
>> particular window will be evicted only when the windows is closed. You will
>> be able to control the threshold on how long to wait for late, out-of-order
>> data before closing a window.
>>
>> We will be updated the docs soon to explain this.
>>
>> On Tue, Nov 29, 2016 at 8:30 PM, Xinyu Zhang <wszxyh@163.com> wrote:
>>
>>> Hi
>>>
>>> I want to use window operations. However, if i don't remove any data,
>>> the "complete" table will become larger and larger as time goes on. So I
>>> want to remove some outdated data in the complete table that I would never
>>> use.
>>> Is there any method to meet my requirement?
>>>
>>> Thanks!
>>>
>>>
>>>
>>>
>>>
>>
>>
>

Mime
View raw message