spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Evo Eftimov" <evo.efti...@isecc.com>
Subject RE: spark filestream problem
Date Sat, 02 May 2015 19:57:09 GMT
I have figured it out in the meantime - simply when moving file on HDFS it
preserves its time stamp and on the other hand the spark filestream adapter
seems to care as much about filenames as timestamps - hence NEW files with
OLD time stamps will NOT be processed - yuk 

The hack you can use is to a) copy the required file in a temp location and
then b) move it from there to the dir monitored by spark filestream - this
will ensure it is with recent timestamp

-----Original Message-----
From: Evo Eftimov [mailto:evo.eftimov@isecc.com] 
Sent: Saturday, May 2, 2015 5:07 PM
To: user@spark.apache.org
Subject: spark filestream problem

it seems that on Spark Streaming 1.2 the filestream API may have a bug - it
doesn't detect new files when moving or renaming them on HDFS - only when
copying them but that leads to a well known problem with .tmp files which
get removed and make spark steraming filestream throw exception



--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/spark-filestream-problem
-tp22742.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org For additional
commands, e-mail: user-help@spark.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message