trafficserver-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leif Hedstrom <zw...@apache.org>
Subject Re: Intesting Behavior at High Memory Usage
Date Thu, 19 Jan 2017 02:11:06 GMT
Which version are you on?

-- Leif 

> On Jan 18, 2017, at 3:21 PM, Kapil Sharma (kapsharm) <kapsharm@cisco.com> wrote:
> 
> We are seeing interesting behavior at high memory usage. Apologize for a long and detailed
email.
> 
> Our ATS caches have been running for many months, and reached a point where ATS has allocated
huge amount of memory to free list pools (we can confirm this by by dumping mempools). I understand
that this is a known ATS behavior/limitation/issue, where freelist mempools once allocated
are never reclaimed (And, that there may be a patch for adding reclamation support). 
> 
> But my question is regarding an interesting side issue that is being observed when the
ATS cache reach this stage. We see our ATS caches allocating large amount of memory for slab
cache - primarily “dentry” cache. This is probably okay, because even though Kernel does
greedy allocation for internal caches (page cache, dentry, inode cache etc.), all of that
memory is reclaimable during low memory pressure. Now, the more interesting behavior is that
during this low memory state, we are observing only one of NUMA zones is exhausting the pages!,
and this particular ATS cache has been in this state for several days.
> Here is snipped output from /proc/zoneinfo:
> 
> <snip>
> Node 0, zone   Normal
>   pages free     6320539   <<< Roughly 25GB free (Note, System has a total of
512GB)
>         min      8129
>         low      10161
>         high     12193
>         scanned  0
>         spanned  66584576
>         present  65674240
>     nr_free_pages 6320539
>     nr_inactive_anon 71
>     nr_active_anon 79274
>     nr_inactive_file 1720428
>     nr_active_file 4580107
>     nr_unevictable 39168773
>     nr_mlock     39168773
>     nr_anon_pages 39239109
>     nr_mapped    13298
>     nr_file_pages 6309563
>     nr_dirty     91
>     nr_writeback 0
>     nr_slab_reclaimable 4581560  <<< ~10G.
>     nr_slab_unreclaimable 16047
>  <snip>
> Node 1, zone   Normal
>   pages free     10224        <<<< Check this. It is below low watermark!!!
>         min      8193
>         low      10241
>         high     12289
>         scanned  0
>         spanned  67108864
>         present  66191360
>     nr_free_pages 10224
>     nr_inactive_anon 64
>     nr_active_anon 20886
>     nr_inactive_file 42840
>     nr_active_file 330486
>     nr_unevictable 45630255
>     nr_mlock     45630255
>     nr_anon_pages 45649954
>     nr_mapped    2151
>     nr_file_pages 374576
>     nr_dirty     9
>     nr_writeback 0
>     nr_slab_reclaimable 11939312   <<< ~48G 
>     nr_slab_unreclaimable 17135   
> <snip>
> 
> It would appear page allocations for slab (from slabtop it is pretty much all dentry)
is disproportionately hitting NUMA zone 1. Under these conditions, my guess is zone/node 1
memory will be constantly under low memory pressure, causing scan/reclaim of pages to constantly
run. Without knowing much about Linux Kernel MM, I am guessing this may be suboptimal?
> 
> Please correct my (wild) assumption on why we may be observing this :
> - My guess is dentry is being created for a each new “accepted”  connection socket.
> - There is only one ACCEPT thread to handle port 80 requests in our cache configuration.
ACCEPT thread is responsible for opening FDs for accepted socket connections.
> - ACCEPT thread is confined to run on cpuset belonging to one NUMA zone only….
> (I am connecting a lot of dots here)
> 
> Any insight will be appreciated.
> 
> thanks
> Kapil
> 
> 
> 
> 
> 
> 

Mime
View raw message