hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 徐鹏 (Jira) <j...@apache.org>
Subject [jira] [Created] (YARN-9769) if "ContainerLocalizer Downloader" thread block ,it will never stop
Date Wed, 21 Aug 2019 08:31:00 GMT
徐鹏 created YARN-9769:
------------------------

             Summary: if "ContainerLocalizer Downloader" thread block ,it will never stop
                 Key: YARN-9769
                 URL: https://issues.apache.org/jira/browse/YARN-9769
             Project: Hadoop YARN
          Issue Type: Improvement
          Components: nodemanager
    Affects Versions: 2.5.0
         Environment: hadoop:2.5.0-cdh5.2.0
            Reporter: 徐鹏
         Attachments: nm_jstack

If "ContainerLocalizer Downloader" thread block ,it will never stop and  nodemanger jvm will
run out of memory .NodeManager should fail "ContainerLocalizer Downloader" thread by timeout.
 
In my case:
    *NM jvm main opt*: -
-XX:InitialHeapSize=2147483648 -XX:MaxGCPauseMillis=200 -XX:MaxHeapSize=2147483648 -XX:MaxNewSize=1287651328
-XX:MinHeapDeltaBytes=1048576 - -XX:+UseG1GC
    *gc* : frequently but work bad (old gen >= 99%) 
 
  !image-2019-08-20-23-39-23-968.png!
   *jstack&jmap*: 3602 "ContainerLocalizer Downloader" threads  block  ,total 561MB
 
{code:java}
// code placeholder"ContainerLocalizer Downloader" #59288379 prio=5 os_prio=0 tid=0x00007f9c62d9d800
nid=0xb7550 waiting on condition [0x00007f9b1c2c0000]"ContainerLocalizer Downloader" #59288379
prio=5 os_prio=0 tid=0x00007f9c62d9d800 nid=0xb7550 waiting on condition [0x00007f9b1c2c0000] 
 java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking
to wait for  <0x000000008057ddb0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUninterruptibly(AbstractQueuedSynchronizer.java:1976)
at org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.allocSlot(DfsClientShmManager.java:254)
at org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager.allocSlot(DfsClientShmManager.java:432)
at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.allocShmSlot(ShortCircuitCache.java:1016)
at org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:449)
at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:783)
at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:717)
at org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:394)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:305) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:590)
- locked <0x00000000fa4ce540> (a org.apache.hadoop.hdfs.DFSInputStream) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:797)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:844) - locked <0x00000000fa4ce540>
(a org.apache.hadoop.hdfs.DFSInputStream) at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:78) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:52)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:264) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:60)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:356) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:354)
at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1701) at
org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:354) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
{code}
!image-2019-08-21-15-18-09-514.png!

!image-2019-08-21-15-18-27-553.png!

 

*ContainerLocalizer.class*

  !image-2019-08-21-16-21-01-610.png!

 

*ADD Loop termination*

   !image-2019-08-21-16-21-23-037.png![^nm_jstack]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org


Mime
View raw message