hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Seth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-10280) LLAP: Handle errors while sending source state updates to the daemons
Date Wed, 30 Mar 2016 18:21:26 GMT

    [ https://issues.apache.org/jira/browse/HIVE-10280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218511#comment-15218511
] 

Siddharth Seth commented on HIVE-10280:
---------------------------------------

bq. The code looks reasonable... the logic though, would it mean one temp failure will make
AM discard all tasks on the node? 
Yes. The intent is to retry the message based on the configured RPC retry in LlapProtocolClientProxy.

bq.  also assume it's safe to mark running tasks as killed from AM perspective (wrt potential
future events from them, etc.); however should we try to send kill to them (and ignore the
failures) so they don't hog resources actually it may be a good idea to send a kill if we
received a status update from some task that we declared dead.
Yes, it's safe to mark a running task as KILLED. We could try sending a kill message, but
that will likely not go through either since the state update did not go through.
If these tasks do successfully send in a heartbeat, they will automatically be told to die
- since the task has been marked as KILLED.

Do you think we should still try sending a kill message ?

> LLAP: Handle errors while sending source state updates to the daemons
> ---------------------------------------------------------------------
>
>                 Key: HIVE-10280
>                 URL: https://issues.apache.org/jira/browse/HIVE-10280
>             Project: Hive
>          Issue Type: Sub-task
>          Components: llap
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>         Attachments: HIVE-10280.1.patch
>
>
> Will likely be handled as marking the node as bad. May need a retry policy in place though
before marking a node bad to handle temporary network glitches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message