lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jigar Shah <jigaronl...@gmail.com>
Subject Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)
Date Mon, 14 Dec 2020 19:17:00 GMT
Thanks, Uwe

Yes, recommended, tmpfs/ramfs worked like a charm in our use-case with a
read-only index, giving us very high-throughput and consistent response
time on queries.

We had to have some redundancy to be built around that service to be
high-available, so we can do a rolling update on the read-only index
reducing the risk of downtime.



On Mon, Dec 14, 2020 at 1:51 PM Uwe Schindler <uwe@thetaphi.de> wrote:

> Hi,
>
> as writer of the original bog post, here my comments:
>
> Yes, MMapDirectory.setPreload() is the feature mentioned in my blog post is
> to load everything into memory - but that does not guarantee anything!
> Still, I would not recommend to use that function, because all it does is
> to
> just touch every page of the file, so the linux kernel puts it into OS
> cache
> - nothing more; IMHO very ineffective as it slows down openining index for
> a
> stupid for-each-page-touch-loop. It will do this with EVERY page, if it is
> later used or not! So this may take some time until it is done. Lateron,
> still Lucene needs to open index files, initialize its own data
> structures,...
>
> In general it is much better to open index, with MMAP directory and execute
> some "sample" queries. This will do exactly the same like the preload
> function, but it is more "selective". Parts of the index which are not used
> won't be touched, and on top, it will also load ALL the required index
> structures to heap.
>
> As always and as mentioned in my blog post: there's nothing that can ensure
> your index will stays in memory. Please trust the kernel to do the right
> thing. Why do you care at all?
>
> If you are curious and want to have everything in memory all the time:
> - use tmpfs as your filesystem (of course you will loose data when OS shuts
> down)
> - disable swap and/or disable swapiness
> - use only as much heap as needed, keep everything of free memory for your
> index outside heap.
>
> Fake feelings of "everything in RAM" are misconceptions like:
> - use RAMDirectory (deprecated): this may be a desaster as it described in
> the blog post
> - use ByteBuffersDirectory: a little bit better, but this brings nothing,
> as
> the operating system kernel may still page out your index pages. They still
> live in/off heap and are part of usual paging. They are just no longer
> backed by a file.
>
> Lucene does most of the stuff outside heap, live with it!
>
> Uwe
>
> -----
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> > -----Original Message-----
> > From: baris.kazar@oracle.com <baris.kazar@oracle.com>
> > Sent: Sunday, December 13, 2020 10:18 PM
> > To: java-user@lucene.apache.org
> > Cc: BARIS KAZAR <baris.kazar@oracle.com>
> > Subject: MMapDirectory vs In Memory Lucene Index (i.e.,
> ByteBuffersDirectory)
> >
> > Hi,-
> >
> > it would be nice to create a Lucene index in files and then effectively
> load it
> > into memory once (since i use in read-only mode). I am looking into if
> this is
> > doable in Lucene.
> >
> > i wish there were an option to load whole Lucene index into memory:
> >
> > Both of below urls have links to the blog url where i quoted a very nice
> section:
> >
> > https://lucene.apache.org/core/8_5_0/core/org/apache/lucene/store/MMapDi
> > rectory.html
> > https://lucene.apache.org/core/8_5_2/core/org/apache/lucene/store/MMapDi
> > rectory.html
> >
> > This following blog mentions about such option
> > to run in the memory: (see the underlined sentence below)
> >
> > https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-
> > 64bit.html?m=1
> >
> > MMapDirectory will not load the whole index into physical memory. Why
> > should it do this? We just ask the operating system to map the file into
> address
> > space for easy access, by no means we are requesting more. Java and the
> O/S
> > optionally provide the option to try loading the whole file into RAM (if
> enough
> > is available), but Lucene does not use that option (we may add this
> possibility
> > in a later version).
> >
> > My question is: is there such an option?
> > is the method setPreLoad for this purpose:
> > to load all Lucene lndex into memory?
> >
> > I would like to use MMapDirectory and set my
> > JVM heap to 16G or a bit less (since my index is
> > around this much).
> >
> > The Lucene 8.5.2 (8.5.0 as well) javadocs say:
> > public void setPreload(boolean preload)
> > Set to true to ask mapped pages to be loaded into physical memory on
> init.
> The
> > behavior is best-effort and operating system dependent.
> >
> > For example Lucene 4.0.0 does not have setPreLoad method.
> >
> > https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/store/MMapDi
> > rectory.html
> >
> > Happy Holidays
> > Best regards
> >
> >
> > Ps. i know there is also BytesBuffersDirectory class for in memory Lucene
> but
> > this requires creating Lucene Index on the fly.
> >
> > This is great for only such kind of Lucene indexes that can be created
> quickly on
> > the fly.
> >
> > Ekaterina has a nice article on this BytesBuffersDirectory class:
> >
> > https://medium.com/@ekaterinamihailova/in-memory-search-and-
> > autocomplete-with-lucene-8-5-f2df1bc71c36
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message