lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ramkumar R. Aiyengar" <andyetitmo...@gmail.com>
Subject Re: In memory index (current status in Lucene)
Date Thu, 04 Jul 2013 20:13:59 GMT
Have you tried using MMapDirectory over a RAM disk (assuming you are on
Linux)? You can avoid writing to disk (and thus the other ways to get to it
persistently as Steven mentions), but still MMap it.
On 1 Jul 2013 22:41, "Lance Norskog" <goksron@gmail.com> wrote:

> My current open source project is a Directory that is just like
> RAMDirectory, but everything is memory-mapped. The idea is it creates a
> disk file, opens it, and immediately deletes the file. The file still
> exists until the IndexReader/Writer/Searcher closes it. But, it cannot be
> found from the file system. This is just like a RAMDirectory, but without
> memory limitations.
>
> It's proving to be harder than it looked.
>
> The application is to store encrypted indexes in memory, with the
> decrypted contents in this non-findable format. I'm in medical document
> analysis now, and we can't store anything on disk in the clear.
>
> Lance
>
> On 07/01/2013 07:07 AM, Emmanuel Espina wrote:
>
>> Hi Erick! Nice to hear from you again! From time to time my interest
>> in these "Lucene things" returns and I do some experiments :p
>>
>> Just to add to this conversation, I found an interesting link to
>> Mike's blog about memory resident indexes (using another virtual
>> machine) http://blog.mikemccandless.**com/2012/07/lucene-index-in-**
>> ram-with-azuls-zing-jvm.html<http://blog.mikemccandless.com/2012/07/lucene-index-in-ram-with-azuls-zing-jvm.html>
>> and also (which is not exactly what I asked but seems related) there
>> is a Google Summer of Code project to build a memory residen term
>> resident: http://www.google-melange.com/**gsoc/project/google/gsoc2013/**
>> billybob/42001<http://www.google-melange.com/gsoc/project/google/gsoc2013/billybob/42001>
>>
>> Thanks
>> Emmanuel
>>
>>
>> 2013/7/1 Erick Erickson <erickerickson@gmail.com>:
>>
>>> Hey Emma! It's been a while....
>>>
>>> Building on what Steven said, here's Uwe's blog on
>>> MMapDirectory and Lucene:
>>> http://blog.thetaphi.de/2012/**07/use-lucenes-mmapdirectory-**
>>> on-64bit.html<http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html>
>>>
>>> I've always considered RAMDirectory for rather restricted
>>> use-cases. I.e. if I know without doubt that the index
>>> is both relatively static and bounded. The other use I've
>>> seen is to use it to index single documents on-the-fly for
>>> some reason (say complex processing of a single result)
>>> then throw it out afterwards.
>>>
>>> How are things going?
>>>
>>> Erick
>>>
>>>
>>>
>>> On Fri, Jun 28, 2013 at 5:36 PM, Steven Schlansker <steven@likeness.com
>>> >wrote:
>>>
>>>  On Jun 28, 2013, at 2:29 PM, Emmanuel Espina <espinaemmanuel@gmail.com>
>>>> wrote:
>>>>
>>>>  I'm building a distributed index (mostly as a reasearch project for
>>>>> school) and I'm evaluating indexing the entire collection in memory
>>>>> (like google, facebook and others have done years ago). The obvious
>>>>> reason for this is performance considering that the replication will
>>>>> give me a reasonably good durability of the data (despite being in
>>>>> volatile memory).
>>>>>
>>>>> What is the current status of Lucene for this kind of indexes?
>>>>> RAMDirectory in it's documentation has a scary warning that says that
>>>>> "is not intended to work with huge indexes", and that sounds more like
>>>>> it is an implementation for testing rather than something for
>>>>> production.
>>>>>
>>>>> Of course there is no real context for this question, because it is a
>>>>> reasearch topic. Testing it's limits would be the closest to a context
>>>>> I have :p
>>>>>
>>>> You could consider MMapDirectory, which will end up putting the active
>>>> portions
>>>> of the index in memory (via the filesystem buffer cache).
>>>>
>>>> The benefit is that you don't completely destroy the Java heap
>>>> (RAMDirectory causes immense
>>>> GC pressure if you are not careful) and you don't have to commit all of
>>>> your ram to index usage all the time.
>>>>
>>>> The downside is that if your working set exceeds the amount of RAM
>>>> available for buffer cache, you will get silent performance degradation
>>>> as
>>>> you fall back to disk reads for the missing blocks.
>>>>
>>>> Maybe this is OK for your use case, maybe not.
>>>>
>>>>
>>>> ------------------------------**------------------------------**
>>>> ---------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<java-user-unsubscribe@lucene.apache.org>
>>>> For additional commands, e-mail: java-user-help@lucene.apache.**org<java-user-help@lucene.apache.org>
>>>>
>>>>
>>>>  ------------------------------**------------------------------**
>> ---------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<java-user-unsubscribe@lucene.apache.org>
>> For additional commands, e-mail: java-user-help@lucene.apache.**org<java-user-help@lucene.apache.org>
>>
>>
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<java-user-unsubscribe@lucene.apache.org>
> For additional commands, e-mail: java-user-help@lucene.apache.**org<java-user-help@lucene.apache.org>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message