hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcelo Vanzin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
Date Tue, 05 Jun 2018 16:22:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502035#comment-16502035
] 

Marcelo Vanzin commented on HIVE-16391:
---------------------------------------

bq. I'm not sure if there's a way to publish two pom files mapping to two different shaded
jars

I'm pretty sure that's not possible, unless they are two separate modules.

I think the proper fix would be to change "hive-exec" to be the "normal" jar, with the pom
published with all dependencies. Then you could have a different, shaded jar published with
a classifier (or a separate module for that, if a separate pom is desired).

The problem with that is that it changes the meaning of Hive's artifacts, so anybody currently
importing hive-exec would see a breakage, and that's probably not desired.

Another option is to change the artifact name of the current "hive-exec" pom. Then you'd publish
the normal jar under the new artifact name, then have a separate module that imports that
jar, shades dependencies, and publishes the result as "hive-exec". That would maintain compatibility
with existing artifacts.

But all that assumes that what Spark wants is the non-shaded hive-exec jar. Historically Hive
and Spark have had different dependencies for a few libraries, and that approach might actually
not work. For example, Kryo used to be different (not sure now). In that case, what Spark
would really need is an even more shaded version of Hive, where all conflicting dependencies
have been relocated in the hive-exec jar.


> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-16391
>                 URL: https://issues.apache.org/jira/browse/HIVE-16391
>             Project: Hive
>          Issue Type: Task
>          Components: Build Infrastructure
>    Affects Versions: 1.2.2
>            Reporter: Reynold Xin
>            Assignee: Saisai Shao
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.2.3
>
>         Attachments: HIVE-16391.patch
>
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the only change
in the fork is to work around the issue that Hive publishes only two sets of jars: one set
with no dependency declared, and another with all the dependencies included in the published
uber jar. That is to say, Hive doesn't publish a set of jars with the proper dependencies
declared.
> There is general consensus on both sides that we should remove the forked Hive.
> The change in the forked version is recorded here https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message