lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: In memory index (current status in Lucene)
Date Mon, 01 Jul 2013 10:30:22 GMT
Hey Emma! It's been a while....

Building on what Steven said, here's Uwe's blog on
MMapDirectory and Lucene:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

I've always considered RAMDirectory for rather restricted
use-cases. I.e. if I know without doubt that the index
is both relatively static and bounded. The other use I've
seen is to use it to index single documents on-the-fly for
some reason (say complex processing of a single result)
then throw it out afterwards.

How are things going?

Erick



On Fri, Jun 28, 2013 at 5:36 PM, Steven Schlansker <steven@likeness.com>wrote:

>
> On Jun 28, 2013, at 2:29 PM, Emmanuel Espina <espinaemmanuel@gmail.com>
> wrote:
>
> > I'm building a distributed index (mostly as a reasearch project for
> > school) and I'm evaluating indexing the entire collection in memory
> > (like google, facebook and others have done years ago). The obvious
> > reason for this is performance considering that the replication will
> > give me a reasonably good durability of the data (despite being in
> > volatile memory).
> >
> > What is the current status of Lucene for this kind of indexes?
> > RAMDirectory in it's documentation has a scary warning that says that
> > "is not intended to work with huge indexes", and that sounds more like
> > it is an implementation for testing rather than something for
> > production.
> >
> > Of course there is no real context for this question, because it is a
> > reasearch topic. Testing it's limits would be the closest to a context
> > I have :p
>
> You could consider MMapDirectory, which will end up putting the active
> portions
> of the index in memory (via the filesystem buffer cache).
>
> The benefit is that you don't completely destroy the Java heap
> (RAMDirectory causes immense
> GC pressure if you are not careful) and you don't have to commit all of
> your ram to index usage all the time.
>
> The downside is that if your working set exceeds the amount of RAM
> available for buffer cache, you will get silent performance degradation as
> you fall back to disk reads for the missing blocks.
>
> Maybe this is OK for your use case, maybe not.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message