flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Dailey (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-12385) RestClusterClient can hang indefinitely during job submission
Date Wed, 01 May 2019 15:35:00 GMT
Matt Dailey created FLINK-12385:
-----------------------------------

             Summary: RestClusterClient can hang indefinitely during job submission
                 Key: FLINK-12385
                 URL: https://issues.apache.org/jira/browse/FLINK-12385
             Project: Flink
          Issue Type: Bug
          Components: Runtime / REST
    Affects Versions: 1.8.0
            Reporter: Matt Dailey


We have had situations where clients would hang indefinitely during job submission, even when
job submission would succeed. We have not yet characterized what happened on the server to
cause this, but we thought that the client should have a timeout for these requests.

This was observed in Flink 1.5.5, but the code seems to still have this problem in 1.8.0.
One option is to include a timeout in calls to {{CompletableFuture.get()}}:
 * [RestClusterClient in 1.5.5|https://github.com/apache/flink/blob/release-1.5.5/flink-clients/src/main/java/org/apache/flink/client/program/rest/RestClusterClient.java#L246]
 * [RestClusterClient in 1.8.0|https://github.com/apache/flink/blob/release-1.8.0/flink-clients/src/main/java/org/apache/flink/client/program/rest/RestClusterClient.java#L247]

Thread dump from client running Flink 1.5.5, running in Java 8:
{noformat}
http-nio-0.0.0.0-8443-exec-6" #34 daemon prio=5 os_prio=0 tid=0x000055b421fd2000 nid=0x29
waiting on condition [0x00007f932e176000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000b331d7c0> (a java.util.concurrent.CompletableFuture$Signaller)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
	at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
	at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
	at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:246)
	at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:464)
	at org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:77)
	at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:410)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message