sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Szabó Attila <mau...@inf.elte.hu>
Subject Re: Release to support Hadoop 3
Date Fri, 13 Apr 2018 15:23:58 GMT

Hello everyone,

I'd like to also attach my thoughts:

New Sqoop version: Last time when I'd the chance to talk about this with some of the PMC members
(e.g. Jarcec, Kate ) we've been on the front to create Sqoop-NG (NG == Next Generation), quite
the same what the Flume community did (and AFAIK from Mike Percy it's been a quite successful
act from their POV). Don't get me wrong, I'm totall NOT against 3.0, though IMHO Sqoop-NG
1.0 would be a better choice.

Kite: I would totally split this effort into two subtasks. First I would get in contact with
the Parquet team, and would create a KITE independent execution path in Sqoop for the Parquet
backed tables (Hive/Impala/etc.). As a part of this effort I would also add direct support
for ORC format (in the past few years I've found it very useful in several different situation,
and usually it's quite inconvenient that Sqoop does not support it "out of the box").

As the second substask I would start to remove every KITE based dependency (but according
to my gut feeling it could break the codebase on too many places, and might not be that EZ
to succeed on that front).

Hadoop 2:

Could anyone please highlight me what would be the pros/cons on this front? AFAIK several
vendors (including Cloudera, Hortonworks, MapR, EMR, etc.) are still supporting Hadoop 2,
and according to my best knowledge most of the userbase are connected to their releases, so
I'd like to provide the chance for those users to use the newest features of Sqoop, thus I
would vote for the compatibility for a bit more time/versions.


I'd like to cast my very direct and LOUD vote against any alpha dependencies (including HBase
or anything else!). IMHO Sqoop is already a stable component of the Apache Foundation, and
the users can depend on it, thus I'd like to avoid any kind of "immature" dependency related
issues. Of course this is also just my solo opinion, but as a community I think we must not
undermine our stability.

On the other fronts I totally agree and +1 with the planned efforts,

Best regards,

From: Szabolcs Vasas <vasas@apache.org>
Sent: Friday, April 13, 2018 3:43 PM
To: dev@sqoop.apache.org
Subject: Re: Release to support Hadoop 3

Hi all,

I also think that completely eliminating the Kite dependency from Sqoop
would be the easiest way of going forward, I will try to analyze this topic
a bit more next week and come up with subtasks so we could work on it in
parallel potentially.

I am happy with the Sqoop 3.0 scope proposal too and Bogi being the release
manager of it.


On Fri, Apr 13, 2018 at 2:37 PM, Boglarka Egyed <bogi@apache.org> wrote:

> Hi Daniel et al,
> Thanks for bringing up this topic and the detailed status update.
> I am sharing my thoughts point by point, please find them below.
> 1) How to get a new Kite release? Maybe we should remove the Kite
> > dependency altogether (as Szabolcs hinted in comments of SQOOP-3171)?
> I think making a new Kite release would be a huge effort as it would
> require upgrading the versions, making the necessary code modifications,
> testing it thoroughly, etc. then making the release itself meanwhile Kite
> is a very passively handled tool having minimal activity on it thus it
> would definitely mean a lot of effort to get it done. It would have a
> dependency on Solr community too as the Morphlines module of Kite is
> heavily used and somewhat actively developed by them. Also indeed there is
> a shorter/longer term goal to get rid of Kite dependency in Sqoop entirely,
> i.e. all release efforts would become throw-away very soon.
> Focusing on the Kite removal seems to be more reasonable to me. However it
> would be great to see an estimation regarding this effort, @Szabolcs could
> you maybe share your thoughts on this?
> 2) Should we drop support for Hadoop 2?
> >
> I think we can drop support for Hadoop 2 especially if we use
> straightforward versioning with the new release.
> > 3) What version number should we use? To avoid confusion with Sqoop2 I'd
> go
> > with 3.0.
> >
> I like this idea, +1 for making a 3.0 release containing these changes.
> > 4) Does (should?) this affect the 1.5 release?
> I think the answer is yes. Currently the following breaking changes are on
> the horizon which could be part of a next Sqoop release:
> * com.cloudera package removal (done)
> * Gradle introduction (in progress)
> * Hadoop/Hive/HBase version upgrade (in progress)
> * Kite deprecation/removal (planned)
> * Bump Java version to 8 (planned )
> Looking at this list I would say that making a Sqoop 1.5 release containing
> only the com.cloudera package removal, the Gradle introduction and the Java
> version bump would mean a somewhat small and irrelevant scope from a user
> perspective so maybe having two releases (1.5 and 3.0) would be a little
> bit overkill. I would instead suggest to go with a Sqoop 3.0 release
> containing all the changes listed above. What do you think?
> Summarizing it up I see the following dependencies for a next Sqoop release
> currently:
> * Finishing up the Gradle patch
> * Hive 3 release
> * Kite removal - this could be the next common effort in the community
> Anyhow I would be happy to take the Release Manager role for the next
> release, please let me know if everyone would be OK with that.
> I am looking forward to see others thoughts on this too.
> Many thanks,
> Bogi
> On Thu, Apr 12, 2018 at 5:17 PM, Dániel Vörös <daniel.voros@gmail.com>
> wrote:
> > Dear All,
> >
> > After some development towards supporting Hadoop 3 (and latest version of
> > downstream components) I'd like to summarize the current state of the
> > upgrade and start the conversation about releasing a new version of Sqoop
> > with Hadoop 3 support.
> >
> > Here's what happened so far:
> >  - Upgraded Hadoop dependency to 3.0.0
> >  - Hive had to be upgraded, since old Hive didn't work with Hadoop 3.
> >  - HBase had to be upgraded since Hive 3 depends on HBase 2(alpha)
> >  - Dealt with a bunch of minor issues like changed Hadoop configuration
> > names and different packaging of Maven artifacts.
> >
> > For details please refer to this ticket and the attached review request:
> > https://issues.apache.org/jira/browse/SQOOP-3305
> >
> > Remaining work:
> >  - Parquet importing doesn't work. It was broken by a
> standalone-metastore
> > change in Hive and fixing would require a new Kite version to be built
> > against Hive 3.
> >  - Hive 3 is going to enable ACID tables by default. We should support
> > importing into these. Details:
> > https://issues.apache.org/jira/browse/SQOOP-3311
> >
> > Other blocking issues:
> >  - There's no Hive 3 release (no alpha/beta) yet.
> >
> > I'd like to kindly ask you all to share any other tasks/issues you know
> of
> > that we should address to support the latest versions. Also, there are a
> > couple open questions:
> >  1) How to get a new Kite release? Maybe we should remove the Kite
> > dependency altogether (as Szabolcs hinted in comments of SQOOP-3171)?
> >  2) Should we drop support for Hadoop 2?
> >  3) What version number should we use? To avoid confusion with Sqoop2 I'd
> > go with 3.0.
> >  4) Does (should?) this affect the 1.5 release?
> >
> > Regards,
> > Daniel
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message