spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sunil Parmar <>
Subject Re: retention policy for spark structured streaming dataset
Date Wed, 14 Mar 2018 19:51:53 GMT
Can you use partitioning ( by day ) ? That will  make it easier to drop
data older than x days outside streaming job.

Sunil Parmar

On Wed, Mar 14, 2018 at 11:36 AM, Lian Jiang <> wrote:

> I have a spark structured streaming job which dump data into a parquet
> file. To avoid the parquet file grows infinitely, I want to discard 3 month
> old data. Does spark streaming supports this? Or I need to stop the
> streaming job, trim the parquet file and restart the streaming job? Thanks
> for any hints.

View raw message