sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boglarka Egyed <b...@apache.org>
Subject Re: Release to support Hadoop 3
Date Fri, 13 Apr 2018 12:37:45 GMT
Hi Daniel et al,

Thanks for bringing up this topic and the detailed status update.

I am sharing my thoughts point by point, please find them below.

1) How to get a new Kite release? Maybe we should remove the Kite
> dependency altogether (as Szabolcs hinted in comments of SQOOP-3171)?


I think making a new Kite release would be a huge effort as it would
require upgrading the versions, making the necessary code modifications,
testing it thoroughly, etc. then making the release itself meanwhile Kite
is a very passively handled tool having minimal activity on it thus it
would definitely mean a lot of effort to get it done. It would have a
dependency on Solr community too as the Morphlines module of Kite is
heavily used and somewhat actively developed by them. Also indeed there is
a shorter/longer term goal to get rid of Kite dependency in Sqoop entirely,
i.e. all release efforts would become throw-away very soon.

Focusing on the Kite removal seems to be more reasonable to me. However it
would be great to see an estimation regarding this effort, @Szabolcs could
you maybe share your thoughts on this?

2) Should we drop support for Hadoop 2?
>

I think we can drop support for Hadoop 2 especially if we use
straightforward versioning with the new release.


> 3) What version number should we use? To avoid confusion with Sqoop2 I'd go
> with 3.0.
>

I like this idea, +1 for making a 3.0 release containing these changes.


> 4) Does (should?) this affect the 1.5 release?


I think the answer is yes. Currently the following breaking changes are on
the horizon which could be part of a next Sqoop release:
* com.cloudera package removal (done)
* Gradle introduction (in progress)
* Hadoop/Hive/HBase version upgrade (in progress)
* Kite deprecation/removal (planned)
* Bump Java version to 8 (planned )

Looking at this list I would say that making a Sqoop 1.5 release containing
only the com.cloudera package removal, the Gradle introduction and the Java
version bump would mean a somewhat small and irrelevant scope from a user
perspective so maybe having two releases (1.5 and 3.0) would be a little
bit overkill. I would instead suggest to go with a Sqoop 3.0 release
containing all the changes listed above. What do you think?

Summarizing it up I see the following dependencies for a next Sqoop release
currently:
* Finishing up the Gradle patch
* Hive 3 release
* Kite removal - this could be the next common effort in the community

Anyhow I would be happy to take the Release Manager role for the next
release, please let me know if everyone would be OK with that.

I am looking forward to see others thoughts on this too.

Many thanks,
Bogi

On Thu, Apr 12, 2018 at 5:17 PM, Dániel Vörös <daniel.voros@gmail.com>
wrote:

> Dear All,
>
> After some development towards supporting Hadoop 3 (and latest version of
> downstream components) I'd like to summarize the current state of the
> upgrade and start the conversation about releasing a new version of Sqoop
> with Hadoop 3 support.
>
> Here's what happened so far:
>  - Upgraded Hadoop dependency to 3.0.0
>  - Hive had to be upgraded, since old Hive didn't work with Hadoop 3.
>  - HBase had to be upgraded since Hive 3 depends on HBase 2(alpha)
>  - Dealt with a bunch of minor issues like changed Hadoop configuration
> names and different packaging of Maven artifacts.
>
> For details please refer to this ticket and the attached review request:
> https://issues.apache.org/jira/browse/SQOOP-3305
>
> Remaining work:
>  - Parquet importing doesn't work. It was broken by a standalone-metastore
> change in Hive and fixing would require a new Kite version to be built
> against Hive 3.
>  - Hive 3 is going to enable ACID tables by default. We should support
> importing into these. Details:
> https://issues.apache.org/jira/browse/SQOOP-3311
>
> Other blocking issues:
>  - There's no Hive 3 release (no alpha/beta) yet.
>
> I'd like to kindly ask you all to share any other tasks/issues you know of
> that we should address to support the latest versions. Also, there are a
> couple open questions:
>  1) How to get a new Kite release? Maybe we should remove the Kite
> dependency altogether (as Szabolcs hinted in comments of SQOOP-3171)?
>  2) Should we drop support for Hadoop 2?
>  3) What version number should we use? To avoid confusion with Sqoop2 I'd
> go with 3.0.
>  4) Does (should?) this affect the 1.5 release?
>
> Regards,
> Daniel
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message