hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "wan kun (JIRA)" <>
Subject [jira] [Commented] (HIVE-17574) Avoid multiple copies of HDFS-based jars when localizing job-jars
Date Sat, 04 Nov 2017 03:55:00 GMT


wan kun commented on HIVE-17574:

Hi, [~mithun] , [~cdrome]:
I have some questions , and look forward to your advices:
1. In the MapReduce jobs, tmpJars have the similar problem. I think we can also use the tmpJars
file on hdfs.
2. Fo the destFS.copyFromLocalFile method in tez DagUtils class, if the source file system
type and the target file system type are also hfs fileSystem, it would not be upload again?
When the MR jobs are submitted´╝îthere would not upload the jars.
3. Could we set the resources permission to PUBLIC, so  they would only be downloaded only
once by NodeManager ?

Thank you

> Avoid multiple copies of HDFS-based jars when localizing job-jars
> -----------------------------------------------------------------
>                 Key: HIVE-17574
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.2.0, 3.0.0, 2.4.0
>            Reporter: Mithun Radhakrishnan
>            Assignee: Chris Drome
>            Priority: Major
>         Attachments: HIVE-17574.1-branch-2.2.patch, HIVE-17574.1-branch-2.patch, HIVE-17574.1.patch,
> Raising this on behalf of [~selinazh]. (For my own reference: YHIVE-1035.)
> This has to do with the classpaths of Hive actions run from Oozie, and affects scripts
that adds jars/resources from HDFS locations.
> As part of Oozie's "sharelib" deploys, foundation jars (such as Hive jars) tend to be
stored in HDFS paths, as are any custom user-libraries used in workflows. An {{ADD JAR|FILE|ARCHIVE}}
statement in a Hive script causes the following steps to occur:
> # Files are downloaded from HDFS to local temp dir.
> # UDFs are resolved/validated.
> # All jars/files, including those just downloaded from HDFS, are shipped right back to
HDFS-based scratch-directories, for job submission.
> For HDFS-based files, this is wasteful and time-consuming. #3 above should skip shipping
HDFS-based resources, and add those directly to the Tez session.
> We have a patch that's being used internally at Yahoo.

This message was sent by Atlassian JIRA

View raw message