Thank you all.

I'll try to make JIRA and PR for that.

Bests,
Dongjoon.

On Wed, Nov 20, 2019 at 4:08 PM Cheng Lian <lian.cs.zju@gmail.com> wrote:
Sean, thanks for the corner cases you listed. They make a lot of sense. Now I do incline to have Hive 2.3 as the default version.

Dongjoon, apologize if I didn't make it clear before. What made me concerned initially was only the following part:

> can we remove the usage of forked `hive` in Apache Spark 3.0 completely officially?

So having Hive 2.3 as the default Hive version and adding a `hive-1.2` profile to keep the Hive 1.2.1 fork looks like a feasible approach to me. Thanks for starting the discussion!

On Wed, Nov 20, 2019 at 9:46 AM Dongjoon Hyun <dongjoon.hyun@gmail.com> wrote:
Yes. Right. That's the situation we are hitting and the result I expected.
We need to change our default with Hive 2 in the POM.

Dongjoon.


On Wed, Nov 20, 2019 at 5:20 AM Sean Owen <srowen@gmail.com> wrote:
Yes, good point. A user would get whatever the POM says without
profiles enabled so it matters.

Playing it out, an app _should_ compile with the Spark dependency
marked 'provided'. In that case the app that is spark-submit-ted is
agnostic to the Hive dependency as the only one that matters is what's
on the cluster. Right? we don't leak through the Hive API in the Spark
API. And yes it's then up to the cluster to provide whatever version
it wants. Vendors will have made a specific version choice when
building their distro one way or the other.

If you run a Spark cluster yourself, you're using the binary distro,
and we're already talking about also publishing a binary distro with
this variation, so that's not the issue.

The corner cases where it might matter are:

- I unintentionally package Spark in the app and by default pull in
Hive 2 when I will deploy against Hive 1. But that's user error, and
causes other problems
- I run tests locally in my project, which will pull in a default
version of Hive defined by the POM

Double-checking, is that right? if so it kind of implies it doesn't
matter. Which is an argument either way about what's the default. I
too would then prefer defaulting to Hive 2 in the POM. Am I missing
something about the implication?

(That fork will stay published forever anyway, that's not an issue per se.)

On Wed, Nov 20, 2019 at 1:40 AM Dongjoon Hyun <dongjoon.hyun@gmail.com> wrote:
> Sean, our published POM is pointing and advertising the illegitimate Hive 1.2 fork as a compile dependency.
> Yes. It can be overridden. So, why does Apache Spark need to publish like that?
> If someone want to use that illegitimate Hive 1.2 fork, let them override it. We are unable to delete those illegitimate Hive 1.2 fork.
> Those artifacts will be orphans.
>