flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-10205) Batch Job: InputSplit Fault tolerant for DataSourceTask
Date Mon, 22 Oct 2018 07:13:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658686#comment-16658686

ASF GitHub Bot commented on FLINK-10205:

tillrohrmann commented on issue #6684:     [FLINK-10205] Batch Job: InputSplit Fault tolerant
for DataSource…
URL: https://github.com/apache/flink/pull/6684#issuecomment-431757163
   @isunjin I agree that the current implementation does not work with region failover. The
thing I'm questioning is whether the `InputSplits` of the failed task need to be processed
by the same (restarted) task or can be given to any running task. So far I'm not convinced
that something would break if we simply return the `InputSplits` to the `InputSplitAssigner`.
I think the `WordCount` example should work with this.
   Before hotfixing something in a way that might hurt us in the future, I would really like
to grasp the full picture of why you want to solve the problem that way.

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

> Batch Job: InputSplit Fault tolerant for DataSourceTask
> -------------------------------------------------------
>                 Key: FLINK-10205
>                 URL: https://issues.apache.org/jira/browse/FLINK-10205
>             Project: Flink
>          Issue Type: Sub-task
>          Components: JobManager
>    Affects Versions: 1.6.1, 1.6.2, 1.7.0
>            Reporter: JIN SUN
>            Assignee: JIN SUN
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.7.0
>   Original Estimate: 168h
>  Remaining Estimate: 168h
> Today DataSource Task pull InputSplits from JobManager to achieve better performance,
however, when a DataSourceTask failed and rerun, it will not get the same splits as its previous
version. this will introduce inconsistent result or even data corruption.
> Furthermore,  if there are two executions run at the same time (in batch scenario),
this two executions should process same splits.
> we need to fix the issue to make the inputs of a DataSourceTask deterministic. The propose
is save all splits into ExecutionVertex and DataSourceTask will pull split from there.
>  document:
> [https://docs.google.com/document/d/1FdZdcA63tPUEewcCimTFy9Iz2jlVlMRANZkO4RngIuk/edit?usp=sharing]

This message was sent by Atlassian JIRA

View raw message