hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-18153) refactor reopen and file management in TezTask
Date Tue, 05 Dec 2017 03:33:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-18153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16277959#comment-16277959
] 

Sergey Shelukhin edited comment on HIVE-18153 at 12/5/17 3:32 AM:
------------------------------------------------------------------

This basically moves Hive resources out of the Tez scratch dir, and also removes the resource
logic that is split between TezTask (that had most but not all of conf resources' logic) and
TezSessionState (that had most by not all of the non-conf resources logic). The session now
localizes all the resources and does all the checks, AM calls, and whatnot.
Additionally, resources are tracked in a separate POJO that can be reused on reopen. It keeps
track both of the directory and of what has been localized into it; that can potentially reduce
number of FS calls for conf-based resources that are currently just refreshed blindly on every
use of the same session. 
https://reviews.apache.org/r/64324/

cc [~sseth] [~prasanth_j]


was (Author: sershe):
This basically moves Hive resources out of the Tez scratch dir, and also removes the resource
logic that is split between TezTask (that had most but not all of conf resources' logic) and
TezSessionState (that had most by not all of the non-conf resources logic). The session now
localizes all the resources and does all the checks, AM calls, and whatnot.
Additionally, resources are tracked in a separate POJO that can be reused on reopen. It keeps
track both of the directory and of what has been localized into it; that can potentially reduce
number of FS calls for conf-based resources that are currently just refreshed blindly on every
use of the same session. 

> refactor reopen and file management in TezTask
> ----------------------------------------------
>
>                 Key: HIVE-18153
>                 URL: https://issues.apache.org/jira/browse/HIVE-18153
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-18153.patch
>
>
> TezTask reopen relies on getting the same session object in terms of setup; WM reopen
returns a new session from the pool. 
> The former has the advantage of not having to reupload files and stuff... but the object
reuse results in a lot of ugly code, and also reopen might be slower on average with the session
pool than just getting a session from the pool. Either WM needs to do the object-preserving
reopen, or TezTask needs to be refactored. It looks like DAG would have to be rebuilt to do
the latter because of some paths tied to a directory of the old session. Let me see if I can
get around that; if not we can do the former; and then if the former results in too much ugly
code in WM to account for object reuse for different Tez client I'd do the latter anyway since
it's a failure path :)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message