lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Espina <>
Subject Re: In memory index (current status in Lucene)
Date Mon, 01 Jul 2013 14:07:07 GMT
Hi Erick! Nice to hear from you again! From time to time my interest
in these "Lucene things" returns and I do some experiments :p

Just to add to this conversation, I found an interesting link to
Mike's blog about memory resident indexes (using another virtual
and also (which is not exactly what I asked but seems related) there
is a Google Summer of Code project to build a memory residen term


2013/7/1 Erick Erickson <>:
> Hey Emma! It's been a while....
> Building on what Steven said, here's Uwe's blog on
> MMapDirectory and Lucene:
> I've always considered RAMDirectory for rather restricted
> use-cases. I.e. if I know without doubt that the index
> is both relatively static and bounded. The other use I've
> seen is to use it to index single documents on-the-fly for
> some reason (say complex processing of a single result)
> then throw it out afterwards.
> How are things going?
> Erick
> On Fri, Jun 28, 2013 at 5:36 PM, Steven Schlansker <>wrote:
>> On Jun 28, 2013, at 2:29 PM, Emmanuel Espina <>
>> wrote:
>> > I'm building a distributed index (mostly as a reasearch project for
>> > school) and I'm evaluating indexing the entire collection in memory
>> > (like google, facebook and others have done years ago). The obvious
>> > reason for this is performance considering that the replication will
>> > give me a reasonably good durability of the data (despite being in
>> > volatile memory).
>> >
>> > What is the current status of Lucene for this kind of indexes?
>> > RAMDirectory in it's documentation has a scary warning that says that
>> > "is not intended to work with huge indexes", and that sounds more like
>> > it is an implementation for testing rather than something for
>> > production.
>> >
>> > Of course there is no real context for this question, because it is a
>> > reasearch topic. Testing it's limits would be the closest to a context
>> > I have :p
>> You could consider MMapDirectory, which will end up putting the active
>> portions
>> of the index in memory (via the filesystem buffer cache).
>> The benefit is that you don't completely destroy the Java heap
>> (RAMDirectory causes immense
>> GC pressure if you are not careful) and you don't have to commit all of
>> your ram to index usage all the time.
>> The downside is that if your working set exceeds the amount of RAM
>> available for buffer cache, you will get silent performance degradation as
>> you fall back to disk reads for the missing blocks.
>> Maybe this is OK for your use case, maybe not.
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message