flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias (Jira)" <j...@apache.org>
Subject [jira] [Comment Edited] (FLINK-21439) Adaptive Scheduler: Add support for exception history
Date Tue, 18 May 2021 08:00:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-21439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346655#comment-17346655
] 

Matthias edited comment on FLINK-21439 at 5/18/21, 7:59 AM:
------------------------------------------------------------

Hi [~bytesandwich],
the {{AdaptiveScheduler}} does not support task failures for now, i.e. there's not dedicated
task information provided which could be used to derive some task name. 

For now, only failures causing a full restart of the {{ExecutionGraph}} can occur. You might
want to compare the error handling of the {{DefaultScheduler}} with the error handling of
the {{AdaptiveScheduler}}. The {{FailureHandlingResult}} created in case of failure in the
{{DefaultScheduler}} does not have a {{ExecutionVertexID}} referring to the {{Execution}}
causing the error. The {{FailureHandlingResult}} is passed into the factory method in [DefaultScheduler:255|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultScheduler.java#L255]
and that specific case is then handled in [FailureHandlingResultSnapshot:66|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/exceptionhistory/FailureHandlingResultSnapshot.java#L66].


was (Author: mapohl):
Hi [~bytesandwich],
the {{AdaptiveScheduler}} does not support task failures for now, i.e. there's not dedicated
task information provided which could be used to derive some task name. 

For now, only global failures can occur. You might want to compare the error handling of the
{{DefaultScheduler}} with the error handling of the {{AdaptiveScheduler}}. The {{FailureHandlingResult}}
created in case of failure in the {{DefaultScheduler}} does not have a {{ExecutionVertexID}}
referring to the {{Execution}} causing the error. The {{FailureHandlingResult}} is passed
into the factory method in [DefaultScheduler:255|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultScheduler.java#L255]
and that specific case is then handled in [FailureHandlingResultSnapshot:66|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/exceptionhistory/FailureHandlingResultSnapshot.java#L66].

> Adaptive Scheduler: Add support for exception history
> -----------------------------------------------------
>
>                 Key: FLINK-21439
>                 URL: https://issues.apache.org/jira/browse/FLINK-21439
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>    Affects Versions: 1.13.0
>            Reporter: Matthias
>            Assignee: John Phelan
>            Priority: Major
>              Labels: pull-request-available, reactive
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> {{SchedulerNG.requestJob}} returns an {{ExecutionGraphInfo}} that was introduced in FLINK-21188.
This {{ExecutionGraphInfo}} holds the information about the {{ArchivedExecutionGraph}} and
exception history information. Currently, it's a list of {{ErrorInfos}}. This might change
due to ongoing work in FLINK-21190 where we might introduced a wrapper class with more information
on the failure.
> The goal of this ticket is to implement the exception history for the {{AdaptiveScheduler}},
i.e. collecting the exceptions that caused restarts. This collection of failures should be
forwarded through {{SchedulerNG.requestJob}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message