lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From baris.ka...@oracle.com
Subject Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)
Date Mon, 14 Dec 2020 20:15:24 GMT
I see, i think i will use first way the constructor woith MMap and i 
will not use setPreload api to avoid slowdowns.

yes, i was expecting a warning from eclipse in the second usage but 
nothing came up.

Thanks for the clarifications.

Best regards


On 12/14/20 2:55 PM, Uwe Schindler wrote:
> Hi,
>
>   
>> Thanks Uwe, i am not insisting on to load everything into memory
>>
>> but loading into memory might speed up and i would like to see how much
>> speedup.
>>
>>
>> but i have one more question and that is still not clear to me:
>>
>> "it is much better to open index, with MMAP directory"
>>
>>
>> does this mean i should not use the constructor but instead use the open
>> api?
> No that means, use MMapDirectory, it should fit your needs. If you have enough memory
outside of heap in your operating system that can be used by Lucene to have all pages of the
mmaped file in memory then it’s the best you can have.
>
> FSDirectory.open() is fine as it will always use MMapDirectory on 64 bit platforms.
>
>> in other words: which way should be preferred?
> Does not matter. If you want to use setPreload() [beware of slowdowns on opening index
files for first time!!!], use constructor of MMAPDirectory, because the FSDirectoryFactory
cannot guarantee which implementation you get.
>
> Calling a static method on a class that does not implement it, is generally considered
bad practise (Eclipse should warn you). The static FSDirectory.open() is a factory method
and should be used (on FSDircetory not its subclass) if you don't know what you want to have
and be operating system independent. If you want MMapDirectory and its features specifically,
use the constructor.
>
>> The example is from both during indexing and searching:
>>
>>
>> /*First way: Using constructor (without setPreload) :*/
>>
>> MMapDirectory dir = new MMapDirectory(Paths.get(indexDir)); // Uses
>> FSLockFactory.getDefault() and DEFAULT_MAX_CHUNK_SIZE which is 1GB
>> ////if (dir.getPreload() == false)
>> ////  dir.setPreload(Constants.PRELOAD_YES); // In-Memory Lucene Index
>> enabled-> *commented out*
>> IndexReader reader = DirectoryReader.open(dir);
>>
>> ...
>>
>>
>> /*Second way: Or using open (without setPreload) :*/
>>
>> *Directory* dir = MMapDirectory.open(Paths.get(indexDir)); //open is
>> inherited from FSDirectory
>> ////if (dir.getPreload() == false)
>> ////  dir.setPreload(Constants.PRELOAD_YES); // In-Memory Lucene Index
>> enabled-> *here setPreload cannot be used*
>> IndexReader reader = DirectoryReader.open(dir);
>> IndexSearcher is = new IndexSearcher(reader);
>>
>> ...
>>
>>
>> Best regards
>>
>>
>> On 12/14/20 1:51 PM, Uwe Schindler wrote:
>>> Hi,
>>>
>>> as writer of the original bog post, here my comments:
>>>
>>> Yes, MMapDirectory.setPreload() is the feature mentioned in my blog post is
>>> to load everything into memory - but that does not guarantee anything!
>>> Still, I would not recommend to use that function, because all it does is to
>>> just touch every page of the file, so the linux kernel puts it into OS cache
>>> - nothing more; IMHO very ineffective as it slows down openining index for a
>>> stupid for-each-page-touch-loop. It will do this with EVERY page, if it is
>>> later used or not! So this may take some time until it is done. Lateron,
>>> still Lucene needs to open index files, initialize its own data
>>> structures,...
>>>
>>> In general it is much better to open index, with MMAP directory and execute
>>> some "sample" queries. This will do exactly the same like the preload
>>> function, but it is more "selective". Parts of the index which are not used
>>> won't be touched, and on top, it will also load ALL the required index
>>> structures to heap.
>>>
>>> As always and as mentioned in my blog post: there's nothing that can ensure
>>> your index will stays in memory. Please trust the kernel to do the right
>>> thing. Why do you care at all?
>>>
>>> If you are curious and want to have everything in memory all the time:
>>> - use tmpfs as your filesystem (of course you will loose data when OS shuts
>>> down)
>>> - disable swap and/or disable swapiness
>>> - use only as much heap as needed, keep everything of free memory for your
>>> index outside heap.
>>>
>>> Fake feelings of "everything in RAM" are misconceptions like:
>>> - use RAMDirectory (deprecated): this may be a desaster as it described in
>>> the blog post
>>> - use ByteBuffersDirectory: a little bit better, but this brings nothing, as
>>> the operating system kernel may still page out your index pages. They still
>>> live in/off heap and are part of usual paging. They are just no longer
>>> backed by a file.
>>>
>>> Lucene does most of the stuff outside heap, live with it!
>>>
>>> Uwe
>>>
>>> -----
>>> Uwe Schindler
>>> Achterdiek 19, D-28357 Bremen
>>>
>> https://urldefense.com/v3/__https://www.thetaphi.de__;!!GqivPVa7Brio!Ll3PR
>> 4BZgqmgJNQ7MrnsXr27zNYgjsyXlMh9h6awmbZgSNW-
>> yVLBCDuFHTogNnw9_Q$
>>> eMail: uwe@thetaphi.de
>>>
>>>> -----Original Message-----
>>>> From: baris.kazar@oracle.com <baris.kazar@oracle.com>
>>>> Sent: Sunday, December 13, 2020 10:18 PM
>>>> To: java-user@lucene.apache.org
>>>> Cc: BARIS KAZAR <baris.kazar@oracle.com>
>>>> Subject: MMapDirectory vs In Memory Lucene Index (i.e.,
>>> ByteBuffersDirectory)
>>>> Hi,-
>>>>
>>>> it would be nice to create a Lucene index in files and then effectively
>>> load it
>>>> into memory once (since i use in read-only mode). I am looking into if
>>> this is
>>>> doable in Lucene.
>>>>
>>>> i wish there were an option to load whole Lucene index into memory:
>>>>
>>>> Both of below urls have links to the blog url where i quoted a very nice
>>> section:
>> https://urldefense.com/v3/__https://lucene.apache.org/core/8_5_0/core/org/a
>> pache/lucene/store/MMapDi__;!!GqivPVa7Brio!Ll3PR4BZgqmgJNQ7MrnsXr27z
>> NYgjsyXlMh9h6awmbZgSNW-yVLBCDuFHTrcPLQ6cQ$
>>>> rectory.html
>>>>
>> https://urldefense.com/v3/__https://lucene.apache.org/core/8_5_2/core/org/a
>> pache/lucene/store/MMapDi__;!!GqivPVa7Brio!Ll3PR4BZgqmgJNQ7MrnsXr27z
>> NYgjsyXlMh9h6awmbZgSNW-yVLBCDuFHToSKhCY-w$
>>>> rectory.html
>>>>
>>>> This following blog mentions about such option
>>>> to run in the memory: (see the underlined sentence below)
>>>>
>>>> https://urldefense.com/v3/__https://blog.thetaphi.de/2012/07/use-lucenes-
>> mmapdirectory-on-
>> __;!!GqivPVa7Brio!Ll3PR4BZgqmgJNQ7MrnsXr27zNYgjsyXlMh9h6awmbZgSNW
>> -yVLBCDuFHTpvqnQhbA$
>>>> 64bit.html?m=1
>>>>
>>>> MMapDirectory will not load the whole index into physical memory. Why
>>>> should it do this? We just ask the operating system to map the file into
>>> address
>>>> space for easy access, by no means we are requesting more. Java and the
>>> O/S
>>>> optionally provide the option to try loading the whole file into RAM (if
>>> enough
>>>> is available), but Lucene does not use that option (we may add this
>>> possibility
>>>> in a later version).
>>>>
>>>> My question is: is there such an option?
>>>> is the method setPreLoad for this purpose:
>>>> to load all Lucene lndex into memory?
>>>>
>>>> I would like to use MMapDirectory and set my
>>>> JVM heap to 16G or a bit less (since my index is
>>>> around this much).
>>>>
>>>> The Lucene 8.5.2 (8.5.0 as well) javadocs say:
>>>> public void setPreload(boolean preload)
>>>> Set to true to ask mapped pages to be loaded into physical memory on init.
>>> The
>>>> behavior is best-effort and operating system dependent.
>>>>
>>>> For example Lucene 4.0.0 does not have setPreLoad method.
>>>>
>>>>
>> https://urldefense.com/v3/__https://lucene.apache.org/core/4_0_0/core/org/a
>> pache/lucene/store/MMapDi__;!!GqivPVa7Brio!Ll3PR4BZgqmgJNQ7MrnsXr27z
>> NYgjsyXlMh9h6awmbZgSNW-yVLBCDuFHTp_iadIDA$
>>>> rectory.html
>>>>
>>>> Happy Holidays
>>>> Best regards
>>>>
>>>>
>>>> Ps. i know there is also BytesBuffersDirectory class for in memory Lucene
>>> but
>>>> this requires creating Lucene Index on the fly.
>>>>
>>>> This is great for only such kind of Lucene indexes that can be created
>>> quickly on
>>>> the fly.
>>>>
>>>> Ekaterina has a nice article on this BytesBuffersDirectory class:
>>>>
>>>> https://urldefense.com/v3/__https://medium.com/@ekaterinamihailova/in-
>> memory-search-and-
>> __;!!GqivPVa7Brio!Ll3PR4BZgqmgJNQ7MrnsXr27zNYgjsyXlMh9h6awmbZgSNW
>> -yVLBCDuFHTry-H8S-g$
>>>> autocomplete-with-lucene-8-5-f2df1bc71c36
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message