spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dongjoon Hyun <dongjoon.h...@gmail.com>
Subject Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)
Date Wed, 20 Nov 2019 17:46:42 GMT
Yes. Right. That's the situation we are hitting and the result I expected.
We need to change our default with Hive 2 in the POM.

Dongjoon.


On Wed, Nov 20, 2019 at 5:20 AM Sean Owen <srowen@gmail.com> wrote:

> Yes, good point. A user would get whatever the POM says without
> profiles enabled so it matters.
>
> Playing it out, an app _should_ compile with the Spark dependency
> marked 'provided'. In that case the app that is spark-submit-ted is
> agnostic to the Hive dependency as the only one that matters is what's
> on the cluster. Right? we don't leak through the Hive API in the Spark
> API. And yes it's then up to the cluster to provide whatever version
> it wants. Vendors will have made a specific version choice when
> building their distro one way or the other.
>
> If you run a Spark cluster yourself, you're using the binary distro,
> and we're already talking about also publishing a binary distro with
> this variation, so that's not the issue.
>
> The corner cases where it might matter are:
>
> - I unintentionally package Spark in the app and by default pull in
> Hive 2 when I will deploy against Hive 1. But that's user error, and
> causes other problems
> - I run tests locally in my project, which will pull in a default
> version of Hive defined by the POM
>
> Double-checking, is that right? if so it kind of implies it doesn't
> matter. Which is an argument either way about what's the default. I
> too would then prefer defaulting to Hive 2 in the POM. Am I missing
> something about the implication?
>
> (That fork will stay published forever anyway, that's not an issue per se.)
>
> On Wed, Nov 20, 2019 at 1:40 AM Dongjoon Hyun <dongjoon.hyun@gmail.com>
> wrote:
> > Sean, our published POM is pointing and advertising the illegitimate
> Hive 1.2 fork as a compile dependency.
> > Yes. It can be overridden. So, why does Apache Spark need to publish
> like that?
> > If someone want to use that illegitimate Hive 1.2 fork, let them
> override it. We are unable to delete those illegitimate Hive 1.2 fork.
> > Those artifacts will be orphans.
> >
>

Mime
View raw message