hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiao Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15359) IPC client hang in kerberized cluster due to JDK deadlock
Date Tue, 03 Apr 2018 23:12:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424727#comment-16424727
] 

Xiao Chen commented on HADOOP-15359:
------------------------------------

bq. JDK-7092821
Yes, I wasn't accurate, updated the description. That jira is the closest I can find. But
if that's taken care of, (i.e. either the synchronized method isn't called any more, or the
lock removed), then there is no deadlock in this case anyways.

One of its linked jiras also mentioned " In order to alleviate this problem applications should
cache the result of the Cipher.getInstance() call per thread and reinitialise (Cipher.init(...))
the cached copy instead of calling Cipher.getInstance() again. " but the caller here would
be krb5...

> IPC client hang in kerberized cluster due to JDK deadlock
> ---------------------------------------------------------
>
>                 Key: HADOOP-15359
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15359
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 2.6.0, 2.8.0, 3.0.0
>            Reporter: Xiao Chen
>            Priority: Major
>         Attachments: 1.jstack, 2.jstack
>
>
> In a recent internal testing, we have found a DFS client hang. Further inspecting jstack
shows the following:
> {noformat}
> "IPC Client (552936351) connection toHOSTNAME:8020 from PRINCIPAL" #7468 daemon prio=5
os_prio=0 tid=0x00007f6bb306c000 nid=0x1c76e waiting for monitor entry [0x00007f6bc2bd6000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at java.security.Provider.getService(Provider.java:1035)
>         - waiting to lock <0x0000000080277040> (a sun.security.provider.Sun)
>         at sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:444)
>         at sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376)
>         at sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486)
>         at javax.crypto.Cipher.getInstance(Cipher.java:513)
>         at sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:202)
>         at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484)
>         at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447)
>         at sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413)
>         at sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59)
>         at sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231)
>         at sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466)
>         at sun.security.jgss.krb5.MessageToken.verifySignAndSeqNumber(MessageToken.java:374)
>         at sun.security.jgss.krb5.WrapToken.getDataFromBuffer(WrapToken.java:284)
>         at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:209)
>         at sun.security.jgss.krb5.WrapToken.getData(WrapToken.java:182)
>         at sun.security.jgss.krb5.Krb5Context.unwrap(Krb5Context.java:1053)
>         at sun.security.jgss.GSSContextImpl.unwrap(GSSContextImpl.java:403)
>         at com.sun.security.sasl.gsskerb.GssKrb5Base.unwrap(GssKrb5Base.java:77)
>         at org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.readNextRpcPacket(SaslRpcClient.java:617)
>         at org.apache.hadoop.security.SaslRpcClient$WrappedInputStream.read(SaslRpcClient.java:583)
>         - locked <0x0000000083444878> (a java.nio.HeapByteBuffer)
>         at java.io.FilterInputStream.read(FilterInputStream.java:133)
>         at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>         - locked <0x00000000834448c0> (a java.io.BufferedInputStream)
>         at java.io.DataInputStream.readInt(DataInputStream.java:387)
>         at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)
>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)
> {noformat}
> and at the end of jstack:
> {noformat}
> Found one Java-level deadlock:
> =============================
> "IPC Parameter Sending Thread #29":
>   waiting to lock monitor 0x0000000017ff49f8 (object 0x0000000080277040, a sun.security.provider.Sun),
>   which is held by UNKNOWN_owner_addr=0x0000000050607000
> Java stack information for the threads listed above:
> ===================================================
> "IPC Parameter Sending Thread #29":
>         at java.security.Provider.getService(Provider.java:1035)
>         - waiting to lock <0x0000000080277040> (a sun.security.provider.Sun)
>         at sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:437)
>         at sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376)
>         at sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486)
>         at javax.crypto.SecretKeyFactory.nextSpi(SecretKeyFactory.java:293)
>         - locked <0x00000000834386b8> (a java.lang.Object)
>         at javax.crypto.SecretKeyFactory.<init>(SecretKeyFactory.java:121)
>         at javax.crypto.SecretKeyFactory.getInstance(SecretKeyFactory.java:160)
>         at sun.security.krb5.internal.crypto.dk.Des3DkCrypto.getCipher(Des3DkCrypto.java:187)
>         at sun.security.krb5.internal.crypto.dk.DkCrypto.dr(DkCrypto.java:484)
>         at sun.security.krb5.internal.crypto.dk.DkCrypto.dk(DkCrypto.java:447)
>         at sun.security.krb5.internal.crypto.dk.DkCrypto.calculateChecksum(DkCrypto.java:413)
>         at sun.security.krb5.internal.crypto.Des3.calculateChecksum(Des3.java:59)
>         at sun.security.jgss.krb5.CipherHelper.calculateChecksum(CipherHelper.java:231)
>         at sun.security.jgss.krb5.MessageToken.getChecksum(MessageToken.java:466)
>         at sun.security.jgss.krb5.MessageToken.genSignAndSeqNumber(MessageToken.java:315)
>         at sun.security.jgss.krb5.WrapToken.<init>(WrapToken.java:422)
>         at sun.security.jgss.krb5.Krb5Context.wrap(Krb5Context.java:922)
>         at sun.security.jgss.GSSContextImpl.wrap(GSSContextImpl.java:385)
>         at com.sun.security.sasl.gsskerb.GssKrb5Base.wrap(GssKrb5Base.java:103)
>         at org.apache.hadoop.security.SaslRpcClient$WrappedOutputStream.write(SaslRpcClient.java:636)
>         at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>         - locked <0x00000000834389a8> (a java.io.BufferedOutputStream)
>         at java.io.DataOutputStream.flush(DataOutputStream.java:123)
>         at org.apache.hadoop.ipc.Client$Connection$3.run(Client.java:1072)
>         - locked <0x00000000834389c0> (a java.io.DataOutputStream)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> {noformat}
> After some research, the closest I found is https://bugs.openjdk.java.net/browse/JDK-7092821
. 
> Filing this jira for discussions.
> JDK version used was 1.8.0_144.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message