sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boglarka Egyed <b...@apache.org>
Subject Re: Release to support Hadoop 3
Date Thu, 10 May 2018 14:00:20 GMT
Hi All,

Thank you Daniel for the update! I was also writing one when your email
arrived so I'm just adding a couple of comments to that.

New major version in JIRA:
Version 3.0.0 has been created in JIRA
<https://issues.apache.org/jira/projects/SQOOP/summary>, please feel free
to use it on the corresponding JIRAs from now. As per my previous email I
see no point in doing an 1.5.0 release currently so I'm OK with moving all
the JIRAs having fix/target version of 1.5.0 to 3.0.0. Any objections?

Update on the dependencies of the release:
* Gradle patch needs some finalization and can be committed soon:
https://reviews.apache.org/r/66067/
* Kite removal effort has been started: SQOOP-3313
<https://issues.apache.org/jira/browse/SQOOP-3313>
* Hive 3.0.0 release is still in an early phase based on this email thread
<https://mail-archives.apache.org/mod_mbox/hive-dev/201804.mbox/%3C2EC60DA6-0A2E-4F3A-92F2-E3CE9D49762A@hortonworks.com%3E>
and has no ETA yet

Thanks Daniel for looking into the Hadoop compatibility question, please
let us know your findings.

Cheers,
Bogi



On Thu, May 10, 2018 at 3:27 PM, Dániel Vörös <daniel.voros@gmail.com>
wrote:

> Dear All,
>
> After Bogi has created the 3.0.0 version in Jira I've applied it to a
> couple of tickets that don't make sense on the 1.x line (without
> Hadoop3/Hive3).
>
> However, as Bogi has mentioned in her previous email, it probably doesn't
> make sense to work on a 1.5 release in parallel with 3.0.0. How would you
> feel if we were to move all 1.5 issues [1] to 3.0.0?
>
> In the meantime I've experimented with running Sqoop 1.4.7 against Hadoop
> 3.1.0, and I'm planning to do the opposite, running Sqoop 3.0.0-SNAPSHOT
> against Hadoop 2.x. That way we'd be able to better assess Attila's
> question about backward compatibility. Please note, that the hard part will
> be Hive integration I'm afraid, and until there's no Hive 3.0 release it's
> hard to test. If anyone's interested in this topic, check out [2].
>
> Regards,
> Daniel
>
> [1]
> https://issues.apache.org/jira/issues?jql=project%20%3D%20SQ
> OOP%20and%20fixVersion%20%3D%201.5.0%20and%20resolutionDate%
> 20is%20not%20%20empty%20order%20by%20resolutiondate%20desc
> [2] https://github.com/dvoros/docker-sqoop
>
> On Mon, Apr 16, 2018 at 2:20 PM Szabolcs Vasas <vasas@apache.org> wrote:
>
> > Hi All,
> >
> > Sqoop NG/Sqoop 3:
> > As far as I remember Sqoop NG was an alternative name suggested for
> Sqoop 2
> > which has a totally different architecture than Sqoop 1. I would not use
> > now since in this release we do not include changes affecting the
> > architecture but bumping the versions of the dependencies. However since
> > dependencies are bumped to another major releases I think we should also
> > change the major version number of Sqoop.
> >
> > Hadoop 2 support:
> > I agree with Daniel that we should not introduce extra complexity to
> > support Hadoop 2 as well. However even if we don't support Hadoop 2 in
> our
> > next major Sqoop release some features which do not require Hadoop 3
> could
> > be backported by the vendors to their earlier releases as well. I think
> > introducing a 1.x branch upstream would lead to an increased complexity
> of
> > committing bug fixes and I am not sure the community wants to make a
> > release in Sqoop 1.x branch. Even if at some point somebody wants to do
> > this they could cut the branch and cherry-pick the necessary bug fixes
> > right before the release.
> >
> > Kite removal:
> > I agree that this is quite complex task on its own but we can't bump the
> > Hadoop/Hive/HBase dependencies without deciding what to do with Kite. One
> > option is to bump these dependencies in Kite too, create a new Kite
> release
> > and bump Sqoop's Kite dependency to this new release. Another option is
> to
> > get rid of the Kite dependency before we bump Hadoop/Hive/HBase version.
> In
> > my opinion the latter one makes more sense since we wanted to eliminate
> the
> > Kite dependency anyway and the Kite project seems to be dead so bumping
> the
> > dependencies, making the necessary code changes, fixing tests and
> creating
> > the release might be an overkill.
> >
> > Szabolcs
> >
> > On Mon, Apr 16, 2018 at 11:50 AM, Dániel Vörös <daniel.voros@gmail.com>
> > wrote:
> >
> > > Hi All,
> > >
> > > I believe we're all on the same page on removing Kite, so I've opened
> > > SQOOP-3313 to track that. @Attila I'm glad to see you're interest in
> the
> > > ORC part. It would be highly appreciated if you could take a look at
> this
> > > review request[1].
> > >
> > > I'm not that familiar with Flume, but it seems they've added NG after
> > > architectural changes and released FlumeNG 1.0 after Flume 0.9.4 [2].
> > Even
> > > if we go with NG, I'd suggest calling it 3.0, to avoid confusion with
> > > earlier releases.
> > >
> > > I think the biggest part of keeping Hadoop 2 (and previous versions of
> > > downstream projects like Hive) supported would be testing against
> those.
> > It
> > > would also require at least another build profile to build against
> them,
> > > and probably another layer of abstraction in the code (like Hadoop
> shims
> > in
> > > Hive).
> > > Not sure about vendors, but I think they're usually not adding new
> > features
> > > to older release lines. In my opinion we should branch off from current
> > > trunk to track the 1.x release line (where we keep supporting Hadoop 2)
> > and
> > > keep adding bugfixes there, but add new features to trunk only and
> don't
> > > worry about Hadoop 2 there.
> > >
> > > I agree with Attila on the dependencies. We shouldn't release based on
> > > non-final releases. We might bump the dependencies to some alpha/beta
> > > during development, but don't forget to move to the final version in
> the
> > > end.
> > >
> > > +1 for Bogi as release manager.
> > >
> > > Regards,
> > > Daniel
> > >
> > > [1] https://reviews.apache.org/r/66548/
> > > [2] https://blogs.apache.org/flume/entry/flume_ng_architecture
> > >
> > > On Fri, Apr 13, 2018 at 5:24 PM Szabó Attila <maugli@inf.elte.hu>
> wrote:
> > >
> > > >
> > > >
> > > > Hello everyone,
> > > >
> > > >
> > > > I'd like to also attach my thoughts:
> > > >
> > > >
> > > > New Sqoop version: Last time when I'd the chance to talk about this
> > with
> > > > some of the PMC members (e.g. Jarcec, Kate ) we've been on the front
> to
> > > > create Sqoop-NG (NG == Next Generation), quite the same what the
> Flume
> > > > community did (and AFAIK from Mike Percy it's been a quite successful
> > act
> > > > from their POV). Don't get me wrong, I'm totall NOT against 3.0,
> though
> > > > IMHO Sqoop-NG 1.0 would be a better choice.
> > > >
> > > >
> > > > Kite: I would totally split this effort into two subtasks. First I
> > would
> > > > get in contact with the Parquet team, and would create a KITE
> > independent
> > > > execution path in Sqoop for the Parquet backed tables
> > (Hive/Impala/etc.).
> > > > As a part of this effort I would also add direct support for ORC
> format
> > > (in
> > > > the past few years I've found it very useful in several different
> > > > situation, and usually it's quite inconvenient that Sqoop does not
> > > support
> > > > it "out of the box").
> > > >
> > > > As the second substask I would start to remove every KITE based
> > > dependency
> > > > (but according to my gut feeling it could break the codebase on too
> > many
> > > > places, and might not be that EZ to succeed on that front).
> > > >
> > > >
> > > > Hadoop 2:
> > > >
> > > > Could anyone please highlight me what would be the pros/cons on this
> > > > front? AFAIK several vendors (including Cloudera, Hortonworks, MapR,
> > EMR,
> > > > etc.) are still supporting Hadoop 2, and according to my best
> knowledge
> > > > most of the userbase are connected to their releases, so I'd like to
> > > > provide the chance for those users to use the newest features of
> Sqoop,
> > > > thus I would vote for the compatibility for a bit more time/versions.
> > > >
> > > >
> > > > Dependencies:
> > > >
> > > > I'd like to cast my very direct and LOUD vote against any alpha
> > > > dependencies (including HBase or anything else!). IMHO Sqoop is
> > already a
> > > > stable component of the Apache Foundation, and the users can depend
> on
> > > it,
> > > > thus I'd like to avoid any kind of "immature" dependency related
> > issues.
> > > Of
> > > > course this is also just my solo opinion, but as a community I think
> we
> > > > must not undermine our stability.
> > > >
> > > > On the other fronts I totally agree and +1 with the planned efforts,
> > > >
> > > > Best regards,
> > > > Attila
> > > >
> > > > ________________________________
> > > > From: Szabolcs Vasas <vasas@apache.org>
> > > > Sent: Friday, April 13, 2018 3:43 PM
> > > > To: dev@sqoop.apache.org
> > > > Subject: Re: Release to support Hadoop 3
> > > >
> > > > Hi all,
> > > >
> > > > I also think that completely eliminating the Kite dependency from
> Sqoop
> > > > would be the easiest way of going forward, I will try to analyze this
> > > topic
> > > > a bit more next week and come up with subtasks so we could work on it
> > in
> > > > parallel potentially.
> > > >
> > > > I am happy with the Sqoop 3.0 scope proposal too and Bogi being the
> > > release
> > > > manager of it.
> > > >
> > > > Szabolcs
> > > >
> > > >
> > > > On Fri, Apr 13, 2018 at 2:37 PM, Boglarka Egyed <bogi@apache.org>
> > wrote:
> > > >
> > > > > Hi Daniel et al,
> > > > >
> > > > > Thanks for bringing up this topic and the detailed status update.
> > > > >
> > > > > I am sharing my thoughts point by point, please find them below.
> > > > >
> > > > > 1) How to get a new Kite release? Maybe we should remove the Kite
> > > > > > dependency altogether (as Szabolcs hinted in comments of
> > SQOOP-3171)?
> > > > >
> > > > >
> > > > > I think making a new Kite release would be a huge effort as it
> would
> > > > > require upgrading the versions, making the necessary code
> > > modifications,
> > > > > testing it thoroughly, etc. then making the release itself
> meanwhile
> > > Kite
> > > > > is a very passively handled tool having minimal activity on it thus
> > it
> > > > > would definitely mean a lot of effort to get it done. It would
> have a
> > > > > dependency on Solr community too as the Morphlines module of Kite
> is
> > > > > heavily used and somewhat actively developed by them. Also indeed
> > there
> > > > is
> > > > > a shorter/longer term goal to get rid of Kite dependency in Sqoop
> > > > entirely,
> > > > > i.e. all release efforts would become throw-away very soon.
> > > > >
> > > > > Focusing on the Kite removal seems to be more reasonable to me.
> > However
> > > > it
> > > > > would be great to see an estimation regarding this effort,
> @Szabolcs
> > > > could
> > > > > you maybe share your thoughts on this?
> > > > >
> > > > > 2) Should we drop support for Hadoop 2?
> > > > > >
> > > > >
> > > > > I think we can drop support for Hadoop 2 especially if we use
> > > > > straightforward versioning with the new release.
> > > > >
> > > > >
> > > > > > 3) What version number should we use? To avoid confusion with
> > Sqoop2
> > > > I'd
> > > > > go
> > > > > > with 3.0.
> > > > > >
> > > > >
> > > > > I like this idea, +1 for making a 3.0 release containing these
> > changes.
> > > > >
> > > > >
> > > > > > 4) Does (should?) this affect the 1.5 release?
> > > > >
> > > > >
> > > > > I think the answer is yes. Currently the following breaking changes
> > are
> > > > on
> > > > > the horizon which could be part of a next Sqoop release:
> > > > > * com.cloudera package removal (done)
> > > > > * Gradle introduction (in progress)
> > > > > * Hadoop/Hive/HBase version upgrade (in progress)
> > > > > * Kite deprecation/removal (planned)
> > > > > * Bump Java version to 8 (planned )
> > > > >
> > > > > Looking at this list I would say that making a Sqoop 1.5 release
> > > > containing
> > > > > only the com.cloudera package removal, the Gradle introduction and
> > the
> > > > Java
> > > > > version bump would mean a somewhat small and irrelevant scope from
> a
> > > user
> > > > > perspective so maybe having two releases (1.5 and 3.0) would be a
> > > little
> > > > > bit overkill. I would instead suggest to go with a Sqoop 3.0
> release
> > > > > containing all the changes listed above. What do you think?
> > > > >
> > > > > Summarizing it up I see the following dependencies for a next Sqoop
> > > > release
> > > > > currently:
> > > > > * Finishing up the Gradle patch
> > > > > * Hive 3 release
> > > > > * Kite removal - this could be the next common effort in the
> > community
> > > > >
> > > > > Anyhow I would be happy to take the Release Manager role for the
> next
> > > > > release, please let me know if everyone would be OK with that.
> > > > >
> > > > > I am looking forward to see others thoughts on this too.
> > > > >
> > > > > Many thanks,
> > > > > Bogi
> > > > >
> > > > > On Thu, Apr 12, 2018 at 5:17 PM, Dániel Vörös <
> > daniel.voros@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Dear All,
> > > > > >
> > > > > > After some development towards supporting Hadoop 3 (and latest
> > > version
> > > > of
> > > > > > downstream components) I'd like to summarize the current state
of
> > the
> > > > > > upgrade and start the conversation about releasing a new version
> of
> > > > Sqoop
> > > > > > with Hadoop 3 support.
> > > > > >
> > > > > > Here's what happened so far:
> > > > > >  - Upgraded Hadoop dependency to 3.0.0
> > > > > >  - Hive had to be upgraded, since old Hive didn't work with
> Hadoop
> > 3.
> > > > > >  - HBase had to be upgraded since Hive 3 depends on HBase
> 2(alpha)
> > > > > >  - Dealt with a bunch of minor issues like changed Hadoop
> > > configuration
> > > > > > names and different packaging of Maven artifacts.
> > > > > >
> > > > > > For details please refer to this ticket and the attached review
> > > > request:
> > > > > > https://issues.apache.org/jira/browse/SQOOP-3305
> > > > > >
> > > > > > Remaining work:
> > > > > >  - Parquet importing doesn't work. It was broken by a
> > > > > standalone-metastore
> > > > > > change in Hive and fixing would require a new Kite version to
be
> > > built
> > > > > > against Hive 3.
> > > > > >  - Hive 3 is going to enable ACID tables by default. We should
> > > support
> > > > > > importing into these. Details:
> > > > > > https://issues.apache.org/jira/browse/SQOOP-3311
> > > > > >
> > > > > > Other blocking issues:
> > > > > >  - There's no Hive 3 release (no alpha/beta) yet.
> > > > > >
> > > > > > I'd like to kindly ask you all to share any other tasks/issues
> you
> > > know
> > > > > of
> > > > > > that we should address to support the latest versions. Also,
> there
> > > are
> > > > a
> > > > > > couple open questions:
> > > > > >  1) How to get a new Kite release? Maybe we should remove the
> Kite
> > > > > > dependency altogether (as Szabolcs hinted in comments of
> > SQOOP-3171)?
> > > > > >  2) Should we drop support for Hadoop 2?
> > > > > >  3) What version number should we use? To avoid confusion with
> > Sqoop2
> > > > I'd
> > > > > > go with 3.0.
> > > > > >  4) Does (should?) this affect the 1.5 release?
> > > > > >
> > > > > > Regards,
> > > > > > Daniel
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message