spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lian Jiang <jiangok2...@gmail.com>
Subject Re: retention policy for spark structured streaming dataset
Date Wed, 14 Mar 2018 19:59:28 GMT
It is already partitioned by timestamp. But is it right retention policy
process to stop the streaming job, trim the parquet file and restart the
streaming job? Thanks.

On Wed, Mar 14, 2018 at 12:51 PM, Sunil Parmar <sunilosunil@gmail.com>
wrote:

> Can you use partitioning ( by day ) ? That will  make it easier to drop
> data older than x days outside streaming job.
>
> Sunil Parmar
>
> On Wed, Mar 14, 2018 at 11:36 AM, Lian Jiang <jiangok2006@gmail.com>
> wrote:
>
>> I have a spark structured streaming job which dump data into a parquet
>> file. To avoid the parquet file grows infinitely, I want to discard 3 month
>> old data. Does spark streaming supports this? Or I need to stop the
>> streaming job, trim the parquet file and restart the streaming job? Thanks
>> for any hints.
>>
>
>

Mime
View raw message