spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Evo Eftimov" <>
Subject RE: spark filestream problem
Date Sat, 02 May 2015 19:57:09 GMT
I have figured it out in the meantime - simply when moving file on HDFS it
preserves its time stamp and on the other hand the spark filestream adapter
seems to care as much about filenames as timestamps - hence NEW files with
OLD time stamps will NOT be processed - yuk 

The hack you can use is to a) copy the required file in a temp location and
then b) move it from there to the dir monitored by spark filestream - this
will ensure it is with recent timestamp

-----Original Message-----
From: Evo Eftimov [] 
Sent: Saturday, May 2, 2015 5:07 PM
Subject: spark filestream problem

it seems that on Spark Streaming 1.2 the filestream API may have a bug - it
doesn't detect new files when moving or renaming them on HDFS - only when
copying them but that leads to a well known problem with .tmp files which
get removed and make spark steraming filestream throw exception

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail: For additional
commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message