hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Himanshu Mishra (Jira)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-22687) Query hangs indefinitely if LLAP daemon registers after the query is submitted
Date Thu, 02 Jan 2020 05:43:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-22687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Himanshu Mishra updated HIVE-22687:
-----------------------------------
    Status: Open  (was: Patch Available)

> Query hangs indefinitely if LLAP daemon registers after the query is submitted
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-22687
>                 URL: https://issues.apache.org/jira/browse/HIVE-22687
>             Project: Hive
>          Issue Type: Bug
>          Components: llap
>    Affects Versions: 3.1.0
>            Reporter: Himanshu Mishra
>            Assignee: Himanshu Mishra
>            Priority: Major
>         Attachments: HIVE-22687.01.patch
>
>
> If a query is submitted and no LLAP daemon is running, it waits for 1 minute and times
out with error {{SERVICE_UNAVAILABLE}}.
> While waiting, if a new LLAP Daemon starts, then the timeout is cancelled, and the tasks
do not get scheduled as well. As a result, the query hangs indefinitely.
> This is due to the race condition where LLAP Daemon first registers the LLAP instance
at {{.../workers/worker-0000}}, and afterwards registers {{.../workers/slot-0000}}. In the
gap between two, Tez AM gets notified of worker zk node and while processing it checks if
slot zk node is present, if not it rejects the LLAP Daemon. Error in Tez AM is:
> {code:java}
> [INFO] [LlapScheduler] |impl.LlapZookeeperRegistryImpl|: Unknown slot for 8ebfdc45-0382-4757-9416-52898885af90{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message