spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jem Tucker <jem.tuc...@gmail.com>
Subject FileInputDStream missing files
Date Wed, 14 Jan 2015 15:53:00 GMT
Hi all,

A small number of the files being moved into my landing directory are not
being "seen" by my fileStream reciever. After looking at the code it seems
that, in the case of long batches (> 1minute), if files are created before
a batch finishes, but only become visible after that batch finished and the
next begins, then it can never be collected by
FileInputDStream.findNewFiles.

I have logged a JIRA but would be grateful if anyone could tell me if there
is a specified reason for this behaviour? It is mainly due to the
calculateNumberOfBatchesToRemember returning 1 if batch time is greater
than 1min

Thanks,

Jem

Mime
View raw message