spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil Das <>
Subject Re: Does Spark Streaming need to list all the files in a directory?
Date Sun, 02 Aug 2015 08:03:26 GMT
I guess it goes through that 500k files
the first time and then use a filter from next time.

Best Regards

On Fri, Jul 31, 2015 at 4:39 AM, Tathagata Das <> wrote:

> For the first time it needs to list them. AFter that the list should be
> cached by the file stream implementation (as far as I remember).
> On Thu, Jul 30, 2015 at 3:55 PM, Brandon White <>
> wrote:
>> Is this a known bottle neck for Spark Streaming textFileStream? Does it
>> need to list all the current files in a directory before he gets the new
>> files? Say I have 500k files in a directory, does it list them all in order
>> to get the new files?

View raw message