lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From baris.ka...@oracle.com
Subject Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)
Date Mon, 14 Dec 2020 23:35:59 GMT
Thanks Robert.

I think these valuable comments need to be placed on javadocs for future 
references.

i think i am getting enough info for making a decision:

i will use MMapDirectory without setPreload and i hope my index will fit 
into the RAM.

i plan to post a blog for findings.

Best regards


On 12/14/20 5:52 PM, Robert Muir wrote:
> On Mon, Dec 14, 2020 at 1:59 PM Uwe Schindler <uwe@thetaphi.de> wrote:
>> Hi,
>>
>> as writer of the original bog post, here my comments:
>>
>> Yes, MMapDirectory.setPreload() is the feature mentioned in my blog post is
>> to load everything into memory - but that does not guarantee anything!
>> Still, I would not recommend to use that function, because all it does is to
>> just touch every page of the file, so the linux kernel puts it into OS cache
>> - nothing more; IMHO very ineffective as it slows down openining index for a
>> stupid for-each-page-touch-loop. It will do this with EVERY page, if it is
>> later used or not! So this may take some time until it is done. Lateron,
>> still Lucene needs to open index files, initialize its own data
>> structures,...
>>
>> In general it is much better to open index, with MMAP directory and execute
>> some "sample" queries. This will do exactly the same like the preload
>> function, but it is more "selective". Parts of the index which are not used
>> won't be touched, and on top, it will also load ALL the required index
>> structures to heap.
>>
> The main purpose of this thing is a fast warming option for random
> access files such as "i want to warm all my norms in RAM" or "i want
> to warm all my docvalues in RAM"... really it should only be used with
> the FileSwitchDirectory for a targeted purpose such as that: it is
> definitely a waste to set it for your entire index. It is just
> exposing the https://docs.oracle.com/javase/7/docs/api/java/nio/MappedByteBuffer.html#load()
> which first calls madvise(MADV_WILLNEED) and then touches every page.
> If you want to "warm" an ENTIRE very specific file for a reason like
> this (e.g. per-doc scoring value, ensuring it will be hot for all
> docs), it is hard to be more efficient than that.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message