hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "mahesh kumar behera (JIRA)" <>
Subject [jira] [Updated] (HIVE-19924) Tag distcp jobs run by Repl Load
Date Wed, 18 Jul 2018 10:27:00 GMT


mahesh kumar behera updated HIVE-19924:
    Status: In Progress  (was: Patch Available)

> Tag distcp jobs run by Repl Load
> --------------------------------
>                 Key: HIVE-19924
>                 URL:
>             Project: Hive
>          Issue Type: Task
>          Components: repl
>    Affects Versions: 3.1.0, 4.0.0
>            Reporter: mahesh kumar behera
>            Assignee: mahesh kumar behera
>            Priority: Major
>              Labels: DR, replication
>             Fix For: 4.0.0, 3.2.0
>         Attachments: HIVE-19924.01.patch, HIVE-19924.02.patch, HIVE-19924.03.patch
> Add tags in jobconf for distcp related jobs started by replication. This will allow hive
to kill these jobs in case beacon retries, or hs2 dies and beacon issues a kill command.
>  * one of the tags should definitely be the query_id that starts the job : With this
flow beacon before retrying the bootstrap load, will issue a kill command to hs2 with the
query id of the previous issued command. hs2 will then kill an running jobs on yarn tagged
with the Query_id.
>  * To get around the additional failure point as mentioned above. The jobs can be tagged
with an additional unique tag_id provided by Beacon in the WITH clause in repl load command
to be used to tag distcp jobs ). Enhance the kill api to take the tag as input and kill jobs
associated with that tag. Problem here is how do we validate the association of the tag with
a hive query id to make sure this api is not used to kill jobs run by other components, however
we can provide this capability to only admins and should be ok in that case.

This message was sent by Atlassian JIRA

View raw message