hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Nauroth <cnaur...@hortonworks.com>
Subject Re: ProcFsBasedProcessTree and clean pages in smaps
Date Fri, 05 Feb 2016 17:10:26 GMT
Interesting, I didn't know about "Locked" in smaps.  Thanks for pointing
that out.

At this point, if Varun's suggestion to check out YARN-1856 doesn't solve
the problem, then I suggest opening a JIRA to track further design
discussion.

--Chris Nauroth




On 2/5/16, 6:10 AM, "Varun Vasudev" <vvasudev@apache.org> wrote:

>Hi Jan,
>
>YARN-1856 was recently committed which allows admins to use cgroups
>instead the ProcFsBasedProcessTree monitory. Would that solve your
>problem? However, that requires usage of the LinuxContainerExecutor.
>
>-Varun
>
>
>
>On 2/5/16, 6:45 PM, "Jan Lukavsk√Ĺ" <jan.lukavsky@firma.seznam.cz> wrote:
>
>>Hi Chris,
>>
>>thanks for your reply. As far as I can see right, new linux kernels show
>>the locked memory in "Locked" field.
>>
>>If mmap file a mlock it, I see the following in 'smaps' file:
>>
>>7efd20aeb000-7efd2172b000 r--p 00000000 103:04 1870
>>/tmp/file.bin
>>Size:              12544 kB
>>Rss:               12544 kB
>>Pss:               12544 kB
>>Shared_Clean:          0 kB
>>Shared_Dirty:          0 kB
>>Private_Clean:     12544 kB
>>Private_Dirty:         0 kB
>>Referenced:        12544 kB
>>Anonymous:             0 kB
>>AnonHugePages:         0 kB
>>Swap:                  0 kB
>>KernelPageSize:        4 kB
>>MMUPageSize:           4 kB
>>Locked:            12544 kB
>>
>>...
>># uname -a
>>Linux XXXXXX 3.2.0-4-amd64 #1 SMP Debian 3.2.68-1+deb7u3 x86_64 GNU/Linux
>>
>>If I do this on an older kernel (2.6.x), the Locked field is missing.
>>
>>I can make a patch for the ProcfsBasedProcessTree that will calculate
>>the "Locked" pages instead of the "Private_Clean" (based on
>>configuration option). The question is - should there be made even more
>>changes in the way the memory footprint is calculated? For instance, I
>>believe the kernel can write to disk even all dirty pages (if they are
>>backed by a file), making them clean and therefore can later free them.
>>Should I open a JIRA for this to have some discussion on this topic?
>>
>>Regards,
>>  Jan
>>
>>
>>On 02/04/2016 07:20 PM, Chris Nauroth wrote:
>>> Hello Jan,
>>>
>>> I am moving this thread from user@hadoop.apache.org to
>>> yarn-dev@hadoop.apache.org, since it's less a question of general usage
>>> and more a question of internal code implementation details and
>>>possible
>>> enhancements.
>>>
>>> I think the issue is that it's not guaranteed in the general case that
>>> Private_Clean pages are easily evictable from page cache by the kernel.
>>> For example, if the pages have been pinned into RAM by calling mlock
>>>[1],
>>> then the kernel cannot evict them.  Since YARN can execute any code
>>> submitted by an application, including possibly code that calls mlock,
>>>it
>>> takes a cautious approach and assumes that these pages must be counted
>>> towards the process footprint.  Although your Spark use case won't
>>>mlock
>>> the pages (I assume), YARN doesn't have a way to identify this.
>>>
>>> Perhaps there is room for improvement here.  If there is a reliable
>>>way to
>>> count only mlock'ed pages, then perhaps that behavior could be added as
>>> another option in ProcfsBasedProcessTree.  Off the top of my head, I
>>>can't
>>> think of a reliable way to do this, and I can't research it further
>>> immediately.  Do others on the thread have ideas?
>>>
>>> --Chris Nauroth
>>>
>>> [1] http://linux.die.net/man/2/mlock
>>>
>>>
>>>
>>>
>>> On 2/4/16, 5:11 AM, "Jan Lukavsk√Ĺ" <jan.lukavsky@firma.seznam.cz>
>>>wrote:
>>>
>>>> Hello,
>>>>
>>>> I have a question about the way LinuxResourceCalculatorPlugin
>>>>calculates
>>>> memory consumed by process tree (it is calculated via
>>>> ProcfsBasedProcessTree class). When we enable caching (disk) in apache
>>>> spark jobs run on YARN cluster, the node manager starts to kill the
>>>> containers while reading the cached data, because of "Container is
>>>> running beyond memory limits ...". The reason is that even if we
>>>>enable
>>>> parsing of the smaps file
>>>> 
>>>>(yarn.nodemanager.container-monitor.procfs-tree.smaps-based-rss.enabled
>>>>)
>>>> the ProcfsBasedProcessTree calculates mmaped read-only pages as
>>>>consumed
>>>> by the process tree, while spark uses
>>>>FileChannel.map(MapMode.READ_ONLY)
>>>> to read the cached data. The JVM then consumes *a lot* more memory
>>>>than
>>>> the configured heap size (and it cannot be really controlled), but
>>>>this
>>>> memory is IMO not really consumed by the process, the kernel can
>>>>reclaim
>>>> these pages, if needed. My question is - is there any explicit reason
>>>> why "Private_Clean" pages are calculated as consumed by process tree?
>>>>I
>>>> patched the ProcfsBasedProcessTree not to calculate them, but I don't
>>>> know if this is the "correct" solution.
>>>>
>>>> Thanks for opinions,
>>>>   cheers,
>>>>   Jan
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
>>>> For additional commands, e-mail: user-help@hadoop.apache.org
>>>>
>>>>
>>
>
>


Mime
View raw message