lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <>
Subject Re: In memory index (current status in Lucene)
Date Mon, 01 Jul 2013 21:41:03 GMT
My current open source project is a Directory that is just like 
RAMDirectory, but everything is memory-mapped. The idea is it creates a 
disk file, opens it, and immediately deletes the file. The file still 
exists until the IndexReader/Writer/Searcher closes it. But, it cannot 
be found from the file system. This is just like a RAMDirectory, but 
without memory limitations.

It's proving to be harder than it looked.

The application is to store encrypted indexes in memory, with the 
decrypted contents in this non-findable format. I'm in medical document 
analysis now, and we can't store anything on disk in the clear.


On 07/01/2013 07:07 AM, Emmanuel Espina wrote:
> Hi Erick! Nice to hear from you again! From time to time my interest
> in these "Lucene things" returns and I do some experiments :p
> Just to add to this conversation, I found an interesting link to
> Mike's blog about memory resident indexes (using another virtual
> machine)
> and also (which is not exactly what I asked but seems related) there
> is a Google Summer of Code project to build a memory residen term
> resident:
> Thanks
> Emmanuel
> 2013/7/1 Erick Erickson <>:
>> Hey Emma! It's been a while....
>> Building on what Steven said, here's Uwe's blog on
>> MMapDirectory and Lucene:
>> I've always considered RAMDirectory for rather restricted
>> use-cases. I.e. if I know without doubt that the index
>> is both relatively static and bounded. The other use I've
>> seen is to use it to index single documents on-the-fly for
>> some reason (say complex processing of a single result)
>> then throw it out afterwards.
>> How are things going?
>> Erick
>> On Fri, Jun 28, 2013 at 5:36 PM, Steven Schlansker <>wrote:
>>> On Jun 28, 2013, at 2:29 PM, Emmanuel Espina <>
>>> wrote:
>>>> I'm building a distributed index (mostly as a reasearch project for
>>>> school) and I'm evaluating indexing the entire collection in memory
>>>> (like google, facebook and others have done years ago). The obvious
>>>> reason for this is performance considering that the replication will
>>>> give me a reasonably good durability of the data (despite being in
>>>> volatile memory).
>>>> What is the current status of Lucene for this kind of indexes?
>>>> RAMDirectory in it's documentation has a scary warning that says that
>>>> "is not intended to work with huge indexes", and that sounds more like
>>>> it is an implementation for testing rather than something for
>>>> production.
>>>> Of course there is no real context for this question, because it is a
>>>> reasearch topic. Testing it's limits would be the closest to a context
>>>> I have :p
>>> You could consider MMapDirectory, which will end up putting the active
>>> portions
>>> of the index in memory (via the filesystem buffer cache).
>>> The benefit is that you don't completely destroy the Java heap
>>> (RAMDirectory causes immense
>>> GC pressure if you are not careful) and you don't have to commit all of
>>> your ram to index usage all the time.
>>> The downside is that if your working set exceeds the amount of RAM
>>> available for buffer cache, you will get silent performance degradation as
>>> you fall back to disk reads for the missing blocks.
>>> Maybe this is OK for your use case, maybe not.
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message