spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: [structured streaming] How to remove outdated data when use Window Operations
Date Thu, 01 Dec 2016 23:20:43 GMT
Yes
<https://github.com/apache/spark/blob/master/python/pyspark/sql/dataframe.py#L340>
!

On Thu, Dec 1, 2016 at 12:57 PM, ayan guha <guha.ayan@gmail.com> wrote:

> Thanks TD. Will it be available in pyspark too?
> On 1 Dec 2016 19:55, "Tathagata Das" <tathagata.das1565@gmail.com> wrote:
>
>> In the meantime, if you are interested, you can read the design doc in
>> the corresponding JIRA - https://issues.apache.org/ji
>> ra/browse/SPARK-18124
>>
>> On Thu, Dec 1, 2016 at 12:53 AM, Tathagata Das <
>> tathagata.das1565@gmail.com> wrote:
>>
>>> That feature is coming in 2.1.0. We have added watermarking, that will
>>> track the event time of the data and accordingly close old windows, output
>>> its corresponding aggregate and then drop its corresponding state. But in
>>> that case, you will have to use append mode, and aggregated data of a
>>> particular window will be evicted only when the windows is closed. You will
>>> be able to control the threshold on how long to wait for late, out-of-order
>>> data before closing a window.
>>>
>>> We will be updated the docs soon to explain this.
>>>
>>> On Tue, Nov 29, 2016 at 8:30 PM, Xinyu Zhang <wszxyh@163.com> wrote:
>>>
>>>> Hi
>>>>
>>>> I want to use window operations. However, if i don't remove any data,
>>>> the "complete" table will become larger and larger as time goes on. So I
>>>> want to remove some outdated data in the complete table that I would never
>>>> use.
>>>> Is there any method to meet my requirement?
>>>>
>>>> Thanks!
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>

Mime
View raw message