flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrey Zagrebin (Jira)" <j...@apache.org>
Subject [jira] [Assigned] (FLINK-14163) Execution#producedPartitions is possibly not assigned when used
Date Wed, 08 Jan 2020 10:22:00 GMT

     [ https://issues.apache.org/jira/browse/FLINK-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Andrey Zagrebin reassigned FLINK-14163:

    Assignee: Yuan Mei

> Execution#producedPartitions is possibly not assigned when used
> ---------------------------------------------------------------
>                 Key: FLINK-14163
>                 URL: https://issues.apache.org/jira/browse/FLINK-14163
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.9.0, 1.10.0
>            Reporter: Zhu Zhu
>            Assignee: Yuan Mei
>            Priority: Major
>             Fix For: 1.10.0
> Currently {{Execution#producedPartitions}} is assigned after the partitions have completed
the registration to shuffle master in {{Execution#registerProducedPartitions(...)}}.
> The partition registration is an async interface ({{ShuffleMaster#registerPartitionWithProducer(...)}}),
so {{Execution#producedPartitions}} is possible[1] not set when used. 
> Usages includes:
> 1. deploying this task, so that the task may be deployed without its result partitions
assigned, and the job would hang. (DefaultScheduler issue only, since legacy scheduler handled
this case)
> 2. generating input descriptors for downstream tasks: 
> 3. retrieve {{ResultPartitionID}} for partition releasing: 
> [1] If a user uses Flink default shuffle master {{NettyShuffleMaster}}, it is not problematic
at the moment since it returns a completed future on registration, so that it would be a synchronized
process. However, if users implement their own shuffle service in which the {{ShuffleMaster#registerPartitionWithProducer}}
returns an pending future, it can be a problem. This is possible since customizable shuffle
service is open to users since 1.9 (via config "shuffle-service-factory.class").
> To avoid issues to happen, we may either 
> 1. fix all the usages of {{Execution#producedPartitions}} regarding the async assigning,
> 2. change {{ShuffleMaster#registerPartitionWithProducer(...)}} to a sync interface

This message was sent by Atlassian Jira

View raw message