spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Terry Hole <hujie.ea...@gmail.com>
Subject Re: [Spark Streaming] The FileInputDStream newFilesOnly=false does not work in 1.2 since
Date Wed, 21 Jan 2015 10:06:14 GMT
Here are sample program to reproduce it, please check it:
https://gist.github.com/jhu-chang/1ee5b0788c7479414eeb

We can see that the file does not been included in 1.2

The file is in customer folder, the timestamp 2015/01/21 15:41:22,



On Wed, Jan 21, 2015 at 2:29 PM, Sean Owen <sowen@cloudera.com> wrote:

> See also SPARK-3276 and SPARK-3553. Can you say more about the
> problem? what are the file timestamps, what happens when you run, what
> log messages if any are relevant. I do not expect there was any
> intended behavior change.
>
> On Wed, Jan 21, 2015 at 5:17 AM, Terry Hole <hujie.eagle@gmail.com> wrote:
> > Hi,
> >
> > I am trying to move from 1.1 to 1.2 and found that the newFilesOnly=false
> > (Intend to include old files) does not work anymore. It works great in
> 1.1,
> > this should be introduced by the last change of this class.
> >
> >
> >
> > Does this flag behavior change or is it a regression?
> >
> > Issue should be caused by this code:
> > From line 157 in FileInputDStream.scala
> >     val modTimeIgnoreThreshold = math.max(
> >         initialModTimeIgnoreThreshold,   // initial threshold based on
> > newFilesOnly setting
> >         currentTime - durationToRemember.milliseconds  // trailing end of
> > the remember window
> >       )
> >
> >
> > Regards
> >
> > - Terry
> >
> >
>

Mime
View raw message