lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravikumar Govindarajan <ravikumar.govindara...@gmail.com>
Subject Re: fadvise/madvise during segment-merges....
Date Wed, 21 May 2014 14:50:16 GMT
>
> But does that mean SEQUENTIAL will evict the
> page once we're done reading it?


Yes, looks like it does evict the pages once read completes...

 Well, that option is too late?  Like, say I read in the N 1 GB files

to merge, then I call DONTNEED once the merge is done, but by then the
pages for searching have already been evicted

Ahh... Thanks for the explanation...

Let me elaborate a bit more

I have numerous unsorted segments with very less sizes and fewer sorted
segments with biggish sizes. Merge-Policy will segregate these 2

The bigger sorted-segments always merge within themselves using SMP &
SEQUENTIAL advise. It should be helpful in this case no?

Smaller unsorted segments also merge within themselves using SMP. But since
the segment-sizes are very less, the effect on buffer-cache must be
negligible. I feel there is no need to advise in this case...

There are also sneaky ways to
> invoke some of these OS-level APIs without using JNI


This is cool stuff... Saves an amazing amount of effort for most of the
things...

--
Ravi


On Wed, May 21, 2014 at 7:13 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Wed, May 21, 2014 at 8:20 AM, Ravikumar Govindarajan
> <ravikumar.govindarajan@gmail.com> wrote:
> > Great blog and lucid explanation
> >
> > I think things have changed in recent kernel versions. I am no expert,
> but
> > could see some code related to this here
> > http://lxr.free-electrons.com/source/mm/fadvise.c?v=3.14
>
> That looks promising.  But does that mean SEQUENTIAL will evict the
> page once we're done reading it?
>
> > O_DIRECT will be terrible drag no?
>
> Actually O_DIRECT is awesome because it completely bypasses the buffer
> cache, so nothing will be evicted.
>
> The downside is you must do your own buffering/read-ahead into
> userspace RAM, so you need to be more careful about heap used...
>
> Also, Linus hates this option :)
>
> > Will a battery-backed disk cache help here?
>
> This will make IndexWriter.commit faster, since the IO device will be
> able to return from fsync before bytes are actually moved to stable
> storage.  But you really shouldn't need to call commit so frequently,
> in which case a faster commit is not so important.
>
> > We are using a SortingMergePolicy which most-often hits data randomly.
> Will
> > SEQUENTIAL help here?
>
> Oh hmm then you should NOT call SEQUENTIAL and should not use
> O_DIRECT!  In fact, you want the IO pages for merging to enter the
> buffer cache....
>
> > Any reasons why you think DONTNEED will be less-useful?
>
> Well, that option is too late?  Like, say I read in the N 1 GB files
> to merge, then I call DONTNEED once the merge is done, but by then the
> pages for searching have already been evicted.  I could instead call
> WONTNEED every few KB of reads/writes but that seems hackish, like
> it's a poor emulation of what SEQUENTIAL would express.
>
> But net/net there has been good progress lately, new IO APIs in Java,
> improvements to Linux kernel, etc.  There are also sneaky ways to
> invoke some of these OS-level APIs without using JNI (the JDK has some
> internal APIs).  I think we should explore this area more, to minimize
> the cost of merging on ongoing searches.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message