ranger-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Charneski (Jira)" <j...@apache.org>
Subject [jira] [Updated] (RANGER-2689) Support multiple versions of Hive
Date Wed, 08 Jan 2020 19:55:00 GMT

     [ https://issues.apache.org/jira/browse/RANGER-2689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrew Charneski updated RANGER-2689:
-------------------------------------
    Description: 
Currently Ranger supports the latest version of Hive, 3.1.2. Unfortunately, there are large
segments of the big data community that still rely on older versions of Hive. Two major examples:

# Spark SQL uses a forked version of Hive 1.2.1 (https://spark.apache.org/docs/latest/sql-migration-guide-hive-compatibility.html)
# EMR provides Hive only up to 2.3.5 (https://docs.aws.amazon.com/emr/latest/ReleaseGuide/Hive-release-history.html)

In order to support these internally, my organization has prepared two modifications of Ranger
to link against these versions. These are illustrated in the PRs https://github.com/acharneski/ranger/pull/4
and https://github.com/acharneski/ranger/pull/5

We would like to eliminate the need for entirely separate builds of Ranger to support this,
and integrate these variants into the main Ranger codebase. We are willing to do the bulk
of the implementation but would first like to discuss the architecture of this change so as
to build it in a way the Ranger committers would be amenable to adopting. 

My initial thought is to split the `hive-agent` module into something like `hive-agent-base`,
`hive-agent-1`, `hive-agent-2`, and `hive-agent-3`. This would allow us to explicitly link
to each major version of Hive while minimizing the duplication of code. Thoughts? 

Thank you!

  was:
Currently Ranger supports the latest version of Hive, 3.1.2. Unfortunately, there are large
segments of the big data community that relies on older versions of Hive. Two major examples:

# Spark SQL uses a forked version of Hive 1.2.1 (https://spark.apache.org/docs/latest/sql-migration-guide-hive-compatibility.html)
# EMR provides Hive only up to 2.3.5 (https://docs.aws.amazon.com/emr/latest/ReleaseGuide/Hive-release-history.html)

In order to support these internally, my organization has prepared two modifications of Ranger
to link against these versions. These are illustrated in the PRs https://github.com/acharneski/ranger/pull/4
and https://github.com/acharneski/ranger/pull/5

We would like to eliminate the need for entirely separate builds of Ranger to support this,
and integrate these variants into the main Ranger codebase. We are willing to do the bulk
of the implementation but would first like to discuss the architecture of this change so as
to build it in a way the Ranger committers would be amenable to adopting. 

My initial thought is to split the `hive-agent` module into something like `hive-agent-base`,
`hive-agent-1`, `hive-agent-2`, and `hive-agent-3`. This would allow us to explicitly link
to each major version of Hive while minimizing the duplication of code. Thoughts? 

Thank you!


> Support multiple versions of Hive
> ---------------------------------
>
>                 Key: RANGER-2689
>                 URL: https://issues.apache.org/jira/browse/RANGER-2689
>             Project: Ranger
>          Issue Type: Improvement
>          Components: plugins
>            Reporter: Andrew Charneski
>            Priority: Major
>
> Currently Ranger supports the latest version of Hive, 3.1.2. Unfortunately, there are
large segments of the big data community that still rely on older versions of Hive. Two major
examples:
> # Spark SQL uses a forked version of Hive 1.2.1 (https://spark.apache.org/docs/latest/sql-migration-guide-hive-compatibility.html)
> # EMR provides Hive only up to 2.3.5 (https://docs.aws.amazon.com/emr/latest/ReleaseGuide/Hive-release-history.html)
> In order to support these internally, my organization has prepared two modifications
of Ranger to link against these versions. These are illustrated in the PRs https://github.com/acharneski/ranger/pull/4
and https://github.com/acharneski/ranger/pull/5
> We would like to eliminate the need for entirely separate builds of Ranger to support
this, and integrate these variants into the main Ranger codebase. We are willing to do the
bulk of the implementation but would first like to discuss the architecture of this change
so as to build it in a way the Ranger committers would be amenable to adopting. 
> My initial thought is to split the `hive-agent` module into something like `hive-agent-base`,
`hive-agent-1`, `hive-agent-2`, and `hive-agent-3`. This would allow us to explicitly link
to each major version of Hive while minimizing the duplication of code. Thoughts? 
> Thank you!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message