lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From baris.ka...@oracle.com
Subject Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)
Date Mon, 14 Dec 2020 18:10:07 GMT
Thanks Jigar, these are great notes, observations, experiments to know 
about and they are very very valuable,

i also plan to write a blog on this topic to help Lucene advance.

Best regards


On 12/14/20 12:44 PM, Jigar Shah wrote:
> I used one of the Linux feature (ramfs, basically mounting ram on a
> partition) to guarantee that it's always in ram (No accidental paging ;)
> cost too).
>
> https://urldefense.com/v3/__https://www.jamescoyle.net/how-to/943-create-a-ram-disk-in-linux__;!!GqivPVa7Brio!L7o3DbosKYTNGBfhVwhvr1QLg-A2u4Xd8QWD5FKapojFuxlIEAQY7H3KlnA2YBj41g$
>
> WARN: Only use if it's a read-only index and can fit in ram and have a
> back-up copy of that index on persistent disk somewhere. You may use any
> directory implementation in Lucene. e.g
> https://urldefense.com/v3/__https://lucene.apache.org/core/7_3_1/core/org/apache/lucene/store/SimpleFSDirectory.html__;!!GqivPVa7Brio!L7o3DbosKYTNGBfhVwhvr1QLg-A2u4Xd8QWD5FKapojFuxlIEAQY7H3KlnCKbHPcgQ$
>
> The search was amazingly quick as the full index was on ram mounted
> directory.
> <https://urldefense.com/v3/__https://www.jamescoyle.net/how-to/943-create-a-ram-disk-in-linux__;!!GqivPVa7Brio!L7o3DbosKYTNGBfhVwhvr1QLg-A2u4Xd8QWD5FKapojFuxlIEAQY7H3KlnA2YBj41g$
>
>
>
>
>
>
>
>
>
> On Mon, Dec 14, 2020 at 11:27 AM <baris.kazar@oracle.com> wrote:
>
>> Thanks Mike, appreciate the reply and the suggestions very much.
>>
>> And Your article link to concurrent search is amazing.
>>
>> Together with in memory and concurrent index (especially in read only mode)
>>
>> these will speed up Lucene queries very much.
>>
>> Happy Holidays
>>
>> Best regards
>>
>>
>> On 12/14/20 10:12 AM, Michael McCandless wrote:
>>> Hello,
>>>
>>> Yes, that is exactly what MMapDirectory.setPreload is trying to do, but
>> not
>>> promises (it is best effort).  I think it asks the OS to touch all pages
>> in
>>> the mapped region so they are cached in RAM, if you have enough RAM.
>>>
>>> Make your JVM heap as low as possible to let the OS have more RAM to use
>> to
>>> load your index.
>>>
>>> Mike McCandless
>>>
>>>
>> https://urldefense.com/v3/__http://blog.mikemccandless.com__;!!GqivPVa7Brio!LEQH8Tyb_BBN_Kc3fEH2w-yhpvS-VwMrpuB0gctqchp3j7L7V6x9piciHOJn-Lr5mA$
>>>
>>> On Sun, Dec 13, 2020 at 4:18 PM <baris.kazar@oracle.com> wrote:
>>>
>>>> Hi,-
>>>>
>>>> it would be nice to create a Lucene index in files and then effectively
>>>> load it into memory once (since i use in read-only mode). I am looking
>> into
>>>> if this is doable in Lucene.
>>>>
>>>> i wish there were an option to load whole Lucene index into memory:
>>>>
>>>> Both of below urls have links to the blog url where i quoted a very nice
>>>> section:
>>>>
>>>>
>>>>
>> https://urldefense.com/v3/__https://lucene.apache.org/core/8_5_0/core/org/apache/lucene/store/MMapDirectory.html__;!!GqivPVa7Brio!LEQH8Tyb_BBN_Kc3fEH2w-yhpvS-VwMrpuB0gctqchp3j7L7V6x9piciHOJaN3djDw$
>>>>
>> https://urldefense.com/v3/__https://lucene.apache.org/core/8_5_2/core/org/apache/lucene/store/MMapDirectory.html__;!!GqivPVa7Brio!LEQH8Tyb_BBN_Kc3fEH2w-yhpvS-VwMrpuB0gctqchp3j7L7V6x9piciHOJhxlyzBw$
>>>> This following blog mentions about such option
>>>> to run in the memory: (see the underlined sentence below)
>>>>
>>>>
>>>>
>> https://urldefense.com/v3/__https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html?m=1__;!!GqivPVa7Brio!LEQH8Tyb_BBN_Kc3fEH2w-yhpvS-VwMrpuB0gctqchp3j7L7V6x9piciHOJ1O4pdIg$
>>>> MMapDirectory will not load the whole index into physical memory. Why
>>>> should it do this? We just ask the operating system to map the file into
>>>> address space for easy access, by no means we are requesting more. Java
>> and
>>>> the O/S optionally provide the option to try loading the whole file into
>>>> RAM (if enough is available), but Lucene does not use that option (we
>> may
>>>> add this possibility in a later version).
>>>>
>>>> My question is: is there such an option?
>>>> is the method setPreLoad for this purpose:
>>>> to load all Lucene lndex into memory?
>>>>
>>>> I would like to use MMapDirectory and set my
>>>> JVM heap to 16G or a bit less (since my index is
>>>> around this much).
>>>>
>>>> The Lucene 8.5.2 (8.5.0 as well) javadocs say:
>>>> public void setPreload(boolean preload)
>>>> Set to true to ask mapped pages to be loaded into physical memory on
>> init.
>>>> The behavior is best-effort and operating system dependent.
>>>>
>>>> For example Lucene 4.0.0 does not have setPreLoad method.
>>>>
>>>>
>>>>
>> https://urldefense.com/v3/__https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/store/MMapDirectory.html__;!!GqivPVa7Brio!LEQH8Tyb_BBN_Kc3fEH2w-yhpvS-VwMrpuB0gctqchp3j7L7V6x9piciHOJ_Zf_dhQ$
>>>> Happy Holidays
>>>> Best regards
>>>>
>>>>
>>>> Ps. i know there is also BytesBuffersDirectory class for in memory
>> Lucene
>>>> but this requires creating Lucene Index on the fly.
>>>>
>>>> This is great for only such kind of Lucene indexes that can be created
>>>> quickly on the fly.
>>>>
>>>> Ekaterina has a nice article on this BytesBuffersDirectory class:
>>>>
>>>>
>>>>
>> https://urldefense.com/v3/__https://medium.com/@ekaterinamihailova/in-memory-search-and-autocomplete-with-lucene-8-5-f2df1bc71c36__;!!GqivPVa7Brio!LEQH8Tyb_BBN_Kc3fEH2w-yhpvS-VwMrpuB0gctqchp3j7L7V6x9piciHOIosJjRzQ$
>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message