lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: How to regulate native memory?
Date Thu, 31 Aug 2017 06:39:54 GMT
Hi,

As a suggestion from my side: As a first thing: disable the bootstrap.memory_lock feature:
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/important-settings.html#bootstrap.memory_lock

It looks like you are using too much heap space and some plugin in your ES installation also
using the maximum direct memory size, so I have the feeling something is using a lot direct
memory and you want to limit that. MMap is NOT direct memory! MMap is also not taken into
account by the OOM killer, because it's not owned by the process. 

To me it looks like the operating system kills your process as it sits locked on a huge amount
of memory. So disable the locking (it is IMHO opinion not really useful and too risky). If
you also have no swap disk, then you may also try to add some swap memory and set systems's
swapiness to "10" or even lower. In production environments it is better to have a little
bit of swap as a last resort, but you should tell it with the vm.swapiness sysctl that it
should only use it as last resort.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Robert Muir [mailto:rcmuir@gmail.com]
> Sent: Thursday, August 31, 2017 6:07 AM
> To: java-user <java-user@lucene.apache.org>
> Subject: Re: How to regulate native memory?
> 
> From the lucene side, it only uses file mappings for reads and doesn't
> allocate any anonymous memory.
> The way lucene uses cache for reads won't impact your OOM
> (http://www.linuxatemyram.com/play.html)
> 
> At the end of the day you are running out of memory on the system
> either way, and your process might just look like a large target based
> for the oom-killer, but that doesn't mean its necessarily your problem
> at all.
> 
> I advise sticking with basic operating system tools like /proc and
> free -m, reproduce the OOM kill situation, just like in that example
> link above, and try to track down the real problem.
> 
> 
> On Wed, Aug 30, 2017 at 11:43 PM, Erik Stephens
> <mrerikstephens@gmail.com> wrote:
> > Yeah, apologies for that long issue - the netty comments aren't related.  My
> two comments near the end might be more interesting here:
> >
> >     https://github.com/elastic/elasticsearch/issues/26269#issuecomment-
> 326060213
> >
> > To try to summarize, I looked to `/proc/$pid/smaps | grep indices` to
> quantify what I think is mostly lucene usage.  Is that an accurate way to
> quantify that?  It shows 51G with `-XX:MaxDirectMemorySize=15G`.  The
> heap is 30G and the resident memory is reported as 82.5G.  That makes a bit
> of sense: 30G + 51G + miscellaneous.
> >
> > `top` reports roughly 51G as shared which is suspiciously close to what I'm
> seeing in /proc/$pid/smaps. Is it correct to think that if a process requests
> memory and there is not enough "free", then the kernel will purge from its
> cache in order to allocate that requested memory?  I'm struggling to see how
> the kernel thinks there isn't enough free memory when so much is in its
> cache, but that concern is secondary at this point.  My primary concern is
> trying to regulate the overall footprint (shared with file system cache or not)
> so that OOM killer not even part of the conversation in the first place.
> >
> > # grep Vm /proc/$pid/status
> > VmPeak: 982739416 kB
> > VmSize: 975784980 kB
> > VmLck:         0 kB
> > VmPin:         0 kB
> > VmHWM:  86555044 kB
> > VmRSS:  86526616 kB
> > VmData: 42644832 kB
> > VmStk:       136 kB
> > VmExe:         4 kB
> > VmLib:     18028 kB
> > VmPTE:    275292 kB
> > VmPMD:      3720 kB
> > VmSwap:        0 kB
> >
> > # free -g
> >               total        used        free      shared  buff/cache   available
> > Mem:            125          54           1           1          69          69
> > Swap:             0           0           0
> >
> > Thanks for the reply!  Apologies if not apropos to this forum - just working
> my way down the rabbit hole :)
> >
> > --
> > Erik
> >
> >
> >> On Aug 30, 2017, at 8:04 PM, Robert Muir <rcmuir@gmail.com> wrote:
> >>
> >> Hello,
> >>
> >> From the thread linked there, its not clear to me the problem relates
> >> to lucene (vs being e.g. a bug in netty, or too many threads, or
> >> potentially many other problems).
> >>
> >> Can you first try to determine to breakdown your problematic "RSS"
> >> from the operating system? Maybe this helps determine if your issue is
> >> with an anonymous mapping (ByteBuffer.allocateDirect) or file mapping
> >> (FileChannel.map).
> >>
> >> WIth recent kernels you can break down RSS with /proc/pid/XXX/status
> >> (RssAnon vs RssFile vs RssShmem):
> >>
> >>    http://man7.org/linux/man-pages/man5/proc.5.html
> >>
> >> If your kernel is old you may have to go through more trouble (summing
> >> up stuff from smaps or whatever)
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message