flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-10205) Batch Job: InputSplit Fault tolerant for DataSourceTask
Date Mon, 22 Oct 2018 03:52:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658536#comment-16658536

ASF GitHub Bot commented on FLINK-10205:

isunjin commented on issue #6684:     [FLINK-10205] Batch Job: InputSplit Fault tolerant for
URL: https://github.com/apache/flink/pull/6684#issuecomment-431735693
   Great discussion, thanks everybody. 
   @wenlong88, the scenario you mention is what i try to fix.  [here](https://github.com/isunjin/flink/commit/b61b58d963ea11d34e2eb7ec6f4fe4bfed4dca4a)
is a concrete example, a simple word count job will have data inconsistent while failover,
the job should fail but success with zero output.
   @tillrohrmann, **_InputSplitAssigner_** generate a list of _**InputSplit**_, the order
might not matter, but every input should be proceed exactly once, if a task fail to process
a _**InputSplit**_, this _**InputSplit**_ should be processed again, however, in batch scenario,
it might not true, the _**DataSourceTask**_ will call _InputSplitAssigner_ to return _**InputSplit**_,
depends on the implementation of _InputSplitAssigner_, the failed _**InputSplit**_ might be
discard,  [this](https://github.com/isunjin/flink/commit/b61b58d963ea11d34e2eb7ec6f4fe4bfed4dca4a)
repro shows that _**LocatableInputSplitAssigner**_ will discard failed _**InputSplit**_  and
thus it has data inconsistent issue.
   Its not a problem in Streaming scenario, as the  _**InputSplit**_ will be treat as a record,
eg: in _**ContinuousFileMonitoringFunction**_, it will collect  _**InputSplit**_ and every
 _**InputSplit**_ will be guaranteed process exactly once by FLINK, @wenlong88 will this work
in your scenario? 
   @tillrohrmann, the problem here is that this is a bug, so should we hotfix it instead of
waiting new feature available.  

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

> Batch Job: InputSplit Fault tolerant for DataSourceTask
> -------------------------------------------------------
>                 Key: FLINK-10205
>                 URL: https://issues.apache.org/jira/browse/FLINK-10205
>             Project: Flink
>          Issue Type: Sub-task
>          Components: JobManager
>    Affects Versions: 1.6.1, 1.6.2, 1.7.0
>            Reporter: JIN SUN
>            Assignee: JIN SUN
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.7.0
>   Original Estimate: 168h
>  Remaining Estimate: 168h
> Today DataSource Task pull InputSplits from JobManager to achieve better performance,
however, when a DataSourceTask failed and rerun, it will not get the same splits as its previous
version. this will introduce inconsistent result or even data corruption.
> Furthermore,  if there are two executions run at the same time (in batch scenario),
this two executions should process same splits.
> we need to fix the issue to make the inputs of a DataSourceTask deterministic. The propose
is save all splits into ExecutionVertex and DataSourceTask will pull split from there.
>  document:
> [https://docs.google.com/document/d/1FdZdcA63tPUEewcCimTFy9Iz2jlVlMRANZkO4RngIuk/edit?usp=sharing]

This message was sent by Atlassian JIRA

View raw message