flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3595) Kafka09 consumer thread does not interrupt when stuck in record emission
Date Fri, 11 Mar 2016 18:13:38 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15191316#comment-15191316
] 

ASF GitHub Bot commented on FLINK-3595:
---------------------------------------

Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/1780#issuecomment-195483345
  
    Test failures
      - CheckpointNotifierITCase (2x) - known instability
      - Kafka08 test
    
    This seems to be the relevant failure
    ```
    03/11/2016 15:44:58	Job execution switched to status FAILED.
    org.apache.flink.client.program.ProgramInvocationException: The program execution failed:
Communication with JobManager failed: null
    	at org.apache.flink.client.program.Client.runBlocking(Client.java:381)
    	at org.apache.flink.client.program.Client.runBlocking(Client.java:355)
    	at org.apache.flink.client.program.Client.runBlocking(Client.java:348)
    	at org.apache.flink.streaming.api.environment.RemoteStreamEnvironment.executeRemotely(RemoteStreamEnvironment.java:206)
    	at org.apache.flink.streaming.api.environment.RemoteStreamEnvironment.execute(RemoteStreamEnvironment.java:172)
    	at org.apache.flink.test.util.TestUtils.tryExecute(TestUtils.java:31)
    	at org.apache.flink.streaming.connectors.kafka.KafkaConsumerTestBase.readSequence(KafkaConsumerTestBase.java:1207)
    	at org.apache.flink.streaming.connectors.kafka.Kafka08ITCase.testOffsetInZookeeper(Kafka08ITCase.java:216)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:483)
    	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
    	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
    	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
    Caused by: org.apache.flink.runtime.client.JobExecutionException: Communication with JobManager
failed: null
    	at org.apache.flink.runtime.client.JobClient.submitJobAndWait(JobClient.java:141)
    	at org.apache.flink.client.program.Client.runBlocking(Client.java:379)
    	... 16 more
    Caused by: java.lang.InterruptedException
    	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1039)
    	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
    	at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208)
    	at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
    	at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
    	at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
    	at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
    	at scala.concurrent.Await$.result(package.scala:107)
    	at scala.concurrent.Await.result(package.scala)
    	at org.apache.flink.runtime.client.JobClient.submitJobAndWait(JobClient.java:133)
    	... 17 more
    ```


> Kafka09 consumer thread does not interrupt when stuck in record emission
> ------------------------------------------------------------------------
>
>                 Key: FLINK-3595
>                 URL: https://issues.apache.org/jira/browse/FLINK-3595
>             Project: Flink
>          Issue Type: Bug
>          Components: Kafka Connector
>    Affects Versions: 1.0.0
>            Reporter: Stephan Ewen
>            Assignee: Ufuk Celebi
>            Priority: Critical
>             Fix For: 1.1.0, 1.0.1
>
>
> When canceling a job, the Kafka 0.9 Consumer Thread may be stuck in a blocking method
(output emitting) and never wakes up.
> The thread as a whole cannot be simply interrupted, because of a bug in Kafka that makes
the consumer freeze/hang up on interrupt.
> There are two possible solutions:
>   - allow and call interrupt when the consumer thread is emitting elements
>   - destroy the output network buffer pools eagerly on canceling. The Kafka thread will
then throw an exception if it is stuck in emitting elements and it will terminate, which is
accepted in case the status is canceled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message