flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephan Ewen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-10038) Parallel the creation of InputSplit if necessary
Date Mon, 06 Aug 2018 09:33:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-10038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569941#comment-16569941

Stephan Ewen commented on FLINK-10038:

There is another way to look at this: We have been thinking for a while about changing the
source interface in the following way:

  - Sources all have a single starting task that generates the work packages (here the input
  - Most of the work is done

  - We move all that logic into the implementation of that specific single task that generates
the work . The JobManager becomes simpler.
  - Splits gets automatically checkpointed, because they are now part of the data flow.

  - We need to extend the network stack a bit such that we can "pulls" splits. Otherwise we
do not get good load balancing. One way to do this would be to add credit only when the thread
calls "getNext()" on the input gate.

> Parallel the creation of InputSplit if necessary
> ------------------------------------------------
>                 Key: FLINK-10038
>                 URL: https://issues.apache.org/jira/browse/FLINK-10038
>             Project: Flink
>          Issue Type: Improvement
>    Affects Versions: 1.5.0
>            Reporter: 陈梓立
>            Priority: Major
>              Labels: improvement, inputformat, parallel, perfomance
> As a continue to the discussion in the PR about parallelize the creation of ExecutionJobVertex
> [~StephanEwen] suggested that we could parallelize the creation of InputSplit, from which
we gain performance improvements.

This message was sent by Atlassian JIRA

View raw message