lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Stephens <mreriksteph...@gmail.com>
Subject Re: How to regulate native memory?
Date Thu, 31 Aug 2017 16:55:54 GMT
Thanks, Robert.  I found this bit from that link enlightening:

"Some parts of the cache can't be dropped, not even to accomodate new
applications. This includes mmap'd pages that have been mlocked by some
application, dirty pages that have not yet been written to storage, and
data stored in tmpfs (including /dev/shm, used for shared memory). The
mmap'd, mlocked pages are stuck in the page cache. Dirty pages will for the
most part swiftly be written out. Data in tmpfs will be swapped out if
possible."

That could've explained why processes are getting OOM killed when there is
so much available from the fs cache, but our elasticsearch is configured to
not lock memory.  Nothing in /prod/$pid/smaps is showing as locked either.
Will explore other avenues.  Thanks again!

--
Erik

On Wed, Aug 30, 2017 at 9:06 PM, Robert Muir <rcmuir@gmail.com> wrote:

> From the lucene side, it only uses file mappings for reads and doesn't
> allocate any anonymous memory.
> The way lucene uses cache for reads won't impact your OOM
> (http://www.linuxatemyram.com/play.html)
>
> At the end of the day you are running out of memory on the system
> either way, and your process might just look like a large target based
> for the oom-killer, but that doesn't mean its necessarily your problem
> at all.
>
> I advise sticking with basic operating system tools like /proc and
> free -m, reproduce the OOM kill situation, just like in that example
> link above, and try to track down the real problem.
>
>
> On Wed, Aug 30, 2017 at 11:43 PM, Erik Stephens
> <mrerikstephens@gmail.com> wrote:
> > Yeah, apologies for that long issue - the netty comments aren't
> related.  My two comments near the end might be more interesting here:
> >
> >     https://github.com/elastic/elasticsearch/issues/26269#
> issuecomment-326060213
> >
> > To try to summarize, I looked to `/proc/$pid/smaps | grep indices` to
> quantify what I think is mostly lucene usage.  Is that an accurate way to
> quantify that?  It shows 51G with `-XX:MaxDirectMemorySize=15G`.  The heap
> is 30G and the resident memory is reported as 82.5G.  That makes a bit of
> sense: 30G + 51G + miscellaneous.
> >
> > `top` reports roughly 51G as shared which is suspiciously close to what
> I'm seeing in /proc/$pid/smaps. Is it correct to think that if a process
> requests memory and there is not enough "free", then the kernel will purge
> from its cache in order to allocate that requested memory?  I'm struggling
> to see how the kernel thinks there isn't enough free memory when so much is
> in its cache, but that concern is secondary at this point.  My primary
> concern is trying to regulate the overall footprint (shared with file
> system cache or not) so that OOM killer not even part of the conversation
> in the first place.
> >
> > # grep Vm /proc/$pid/status
> > VmPeak: 982739416 kB
> > VmSize: 975784980 kB
> > VmLck:         0 kB
> > VmPin:         0 kB
> > VmHWM:  86555044 kB
> > VmRSS:  86526616 kB
> > VmData: 42644832 kB
> > VmStk:       136 kB
> > VmExe:         4 kB
> > VmLib:     18028 kB
> > VmPTE:    275292 kB
> > VmPMD:      3720 kB
> > VmSwap:        0 kB
> >
> > # free -g
> >               total        used        free      shared  buff/cache
>  available
> > Mem:            125          54           1           1          69
>     69
> > Swap:             0           0           0
> >
> > Thanks for the reply!  Apologies if not apropos to this forum - just
> working my way down the rabbit hole :)
> >
> > --
> > Erik
> >
> >
> >> On Aug 30, 2017, at 8:04 PM, Robert Muir <rcmuir@gmail.com> wrote:
> >>
> >> Hello,
> >>
> >> From the thread linked there, its not clear to me the problem relates
> >> to lucene (vs being e.g. a bug in netty, or too many threads, or
> >> potentially many other problems).
> >>
> >> Can you first try to determine to breakdown your problematic "RSS"
> >> from the operating system? Maybe this helps determine if your issue is
> >> with an anonymous mapping (ByteBuffer.allocateDirect) or file mapping
> >> (FileChannel.map).
> >>
> >> WIth recent kernels you can break down RSS with /proc/pid/XXX/status
> >> (RssAnon vs RssFile vs RssShmem):
> >>
> >>    http://man7.org/linux/man-pages/man5/proc.5.html
> >>
> >> If your kernel is old you may have to go through more trouble (summing
> >> up stuff from smaps or whatever)
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message