spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ganterm <>
Subject Spark streaming - tracking/deleting processed files
Date Fri, 30 Jan 2015 18:07:16 GMT
We are running a Spark streaming job that retrieves files from a directory
(using textFileStream). 
One concern we are having is the case where the job is down but files are
still being added to the directory.
Once the job starts up again, those files are not being picked up (since
they are not new or changed while the job is running) but we would like them
to be processed. 
Is there a solution for that? Is there a way to keep track what files have
been processed and can we "force" older files to be picked up? Is there a
way to delete the processed files? 


View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message