flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1489) Failing JobManager due to blocking calls in Execution.scheduleOrUpdateConsumers
Date Wed, 11 Feb 2015 16:26:13 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14316466#comment-14316466
] 

ASF GitHub Bot commented on FLINK-1489:
---------------------------------------

Github user rmetzger commented on the pull request:

    https://github.com/apache/flink/pull/378#issuecomment-73911265
  
    The job that was previously failing is fixed with this change.
    
    We should merge this change ASAP, because its kinda impossible right now to seriously
use flink 0.9-SNAPSHOT without it.


> Failing JobManager due to blocking calls in Execution.scheduleOrUpdateConsumers
> -------------------------------------------------------------------------------
>
>                 Key: FLINK-1489
>                 URL: https://issues.apache.org/jira/browse/FLINK-1489
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>
> [~Zentol] reported that the JobManager failed to execute his python job. The reason is
that the the JobManager executes blocking calls in the actor thread in the method {{Execution.sendUpdateTaskRpcCall}}
as a result to receiving a {{ScheduleOrUpdateConsumers}} message. 
> Every TaskManager possibly sends a {{ScheduleOrUpdateConsumers}} to the JobManager to
notify the consumers about available data. The JobManager then sends to each TaskManager the
respective update call {{Execution.sendUpdateTaskRpcCall}}. By blocking the actor thread,
we effectively execute the update calls sequentially. Due to the ever accumulating delay,
some of the initial timeouts on the TaskManager side in {{IntermediateResultParititon.scheduleOrUpdateConsumers}}
fail. As a result the execution of the respective Tasks fails.
> A solution would be to make the call non-blocking.
> A general caveat for actor programming is: We should never block the actor thread, otherwise
we seriously jeopardize the scalability of the system. Or even worse, the system simply fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message