hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Seth (JIRA)" <>
Subject [jira] [Updated] (HIVE-16094) queued containers may timeout if they don't get to run for a long time
Date Thu, 02 Mar 2017 23:41:45 GMT


Siddharth Seth updated HIVE-16094:
    Attachment: HIVE-16094.01.patch

The problem was that if an am was picked up by the queueDrainer when it had 0 fragments, it
would not be put back. registerFragment would only add a new entry to the queue if the am
was not known.

AMNodeInfo instances were originally meant to be used across multiple queries belonging to
an AM. We could still achieve that by going back to the old model of reference counting.

However, I think it's cleaner to maintain an AMNodeInfo instance per query instance. So -
the patch changes the key to be the queryIdentifier. An instance of amNodeInfo is always maintained
in the queue. A heartbeat is only sent if there are pending fragments. It is removed from
the queue after query completion, or if an error is hit.

cc [~prasanth_j] for review.

> queued containers may timeout if they don't get to run for a long time
> ----------------------------------------------------------------------
>                 Key: HIVE-16094
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>            Priority: Critical
>         Attachments: HIVE-16094.01.patch
> I believe this happened after HIVE-15958 - since we end up keeping amNodeInfo in knownAppMaters,
and that can result in the callable not being scheduled on new task registration.

This message was sent by Atlassian JIRA

View raw message