hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mithun Radhakrishnan (JIRA)" <>
Subject [jira] [Commented] (HIVE-17574) Avoid multiple copies of HDFS-based jars when localizing job-jars
Date Mon, 02 Oct 2017 20:31:01 GMT


Mithun Radhakrishnan commented on HIVE-17574:

bq.  If this is only for the resources that are HDFS-based and not the ones that are based
on local files and are also present on HDFS (e.g. hive-exec) that makes sense to me.

[~sershe], [~thejas], thank you very much for your attention. Yes, this is indeed the case.
We ran into this when we began to use up space in scratch-dirs for copying user-libs that
were already on HDFS (in their workflow/lib dirs).

bq.  is it possible to add an off switch?
Yes, of course. {{hive.resource.use.hdfs.location}} is the switch. It's set to {{true}} (i.e.
"ON"), by default.

> Avoid multiple copies of HDFS-based jars when localizing job-jars
> -----------------------------------------------------------------
>                 Key: HIVE-17574
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.2.0, 3.0.0, 2.4.0
>            Reporter: Mithun Radhakrishnan
>            Assignee: Chris Drome
>         Attachments: HIVE-17574.1-branch-2.2.patch, HIVE-17574.1-branch-2.patch, HIVE-17574.1.patch
> Raising this on behalf of [~selinazh]. (For my own reference: YHIVE-1035.)
> This has to do with the classpaths of Hive actions run from Oozie, and affects scripts
that adds jars/resources from HDFS locations.
> As part of Oozie's "sharelib" deploys, foundation jars (such as Hive jars) tend to be
stored in HDFS paths, as are any custom user-libraries used in workflows. An {{ADD JAR|FILE|ARCHIVE}}
statement in a Hive script causes the following steps to occur:
> # Files are downloaded from HDFS to local temp dir.
> # UDFs are resolved/validated.
> # All jars/files, including those just downloaded from HDFS, are shipped right back to
HDFS-based scratch-directories, for job submission.
> For HDFS-based files, this is wasteful and time-consuming. #3 above should skip shipping
HDFS-based resources, and add those directly to the Tez session.
> We have a patch that's being used internally at Yahoo.

This message was sent by Atlassian JIRA

View raw message