spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil Das <ak...@hacked.work>
Subject Re: spark streaming - how to purge old data files in data directory
Date Sun, 19 Jun 2016 01:34:12 GMT
Currently, there is no out of the box solution for this. Although, you can
use other hdfs utils to remove older files from the directory (say 24hrs
old). Another approach is discussed here
<http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-tracking-deleting-processed-files-td21444.html>
.

On Sun, Jun 19, 2016 at 7:28 AM, Vamsi Krishna <vamsi.attluri@gmail.com>
wrote:

> Hi,
>
> I'm on HDP 2.3.2 cluster (Spark 1.4.1).
> I have a spark streaming app which uses 'textFileStream' to stream simple
> CSV files and process.
> I see the old data files that are processed are left in the data directory.
> What is the right way to purge the old data files in data directory on
> HDFS?
>
> Thanks,
> Vamsi Attluri
> --
> Vamsi Attluri
>



-- 
Cheers!

Mime
View raw message