sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boglarka Egyed <b...@apache.org>
Subject Re: Release to support Hadoop 3
Date Fri, 11 May 2018 09:37:47 GMT
Hi All,

We are currently doing the scoping of the next release(s) and no official
release process has started yet.

I got JIRA admin rights to take the load off the PMC members on this front.
Creating version 3.0 in the JIRA was a purely administrational thing just
as was creating 1.5 before.

Discussions are still ongoing about
* the version(s): 1.5 and/or 3.0
* to drop/keep support for Hadoop2

No explicit decisions were made and no PMC members have expressed their
concerns on these yet.

Based on this thanks for your thoughts Attila, I think we are on the path
to have a compromise about these questions and to make a decision that
everyone can accept.

Regards,
Bogi

On Fri, May 11, 2018 at 2:51 AM, Attila Szabó <maugli@apache.org> wrote:

> Dear Sqoop community,
>
> Am I the only one who is missing some formal decision making, announcement
> and process here?
>
>  - When did the PMC made decision about that we're going for version 3.0
> (instead of any other version alternatives)? When and where was it
> announced? How is it possible that some of the contributors know this fact
> earlier then the rests of the community?
>
>  - When did Bogi become a PMC member, and why was that fact not announced
> to the community as it used to be (and still not yet visible on the project
> site)? Of course this is just an assumption, but according to some PMC
> chair emails back this February: JIRA administrator privileges are only PMC
> members, and I guess we still follow this rule, as no other information was
> announced, thus I guess the reason why Bogi was able to administrate the
> available versions in the JIRA means she has been lifted  (and here I'd
> like to send my congrats, if this is true, and my assumption was valid).
>
>  - When did the PMC made decision about dropping Hadoop 2 compatibility?
> When and where was it announced?
>
> If as a community which is officially part of the ASF we do have rules, why
> do we look like not following them?
>
> Regards,
> Attila
>
> ps:
> Objections against dropping 1.5, is clearly the fact that having 1.5 was
> decided as a community, and yet we're still not sure if those changes would
> be only delivered in a next major version, how they would be backported,
> cherry-picked, etc. And as the majority of the users are still on 2.x, I
> think we cannot force ppl to upgrade, just to being able to use some of the
> originally 1.5 planned changes.
> So this item is absolutely -1 on my side!
>
> ps2:
> Daniel! I've provided some comments on your ORC Jira.
>
> On Thu, May 10, 2018 at 4:00 PM, Boglarka Egyed <bogi@apache.org> wrote:
>
> > Hi All,
> >
> > Thank you Daniel for the update! I was also writing one when your email
> > arrived so I'm just adding a couple of comments to that.
> >
> > New major version in JIRA:
> > Version 3.0.0 has been created in JIRA
> > <https://issues.apache.org/jira/projects/SQOOP/summary>, please feel
> free
> > to use it on the corresponding JIRAs from now. As per my previous email I
> > see no point in doing an 1.5.0 release currently so I'm OK with moving
> all
> > the JIRAs having fix/target version of 1.5.0 to 3.0.0. Any objections?
> >
> > Update on the dependencies of the release:
> > * Gradle patch needs some finalization and can be committed soon:
> > https://reviews.apache.org/r/66067/
> > * Kite removal effort has been started: SQOOP-3313
> > <https://issues.apache.org/jira/browse/SQOOP-3313>
> > * Hive 3.0.0 release is still in an early phase based on this email
> thread
> > <https://mail-archives.apache.org/mod_mbox/hive-dev/201804.m
> > box/%3C2EC60DA6-0A2E-4F3A-92F2-E3CE9D49762A@hortonworks.com%3E>
> > and has no ETA yet
> >
> > Thanks Daniel for looking into the Hadoop compatibility question, please
> > let us know your findings.
> >
> > Cheers,
> > Bogi
> >
> >
> >
> > On Thu, May 10, 2018 at 3:27 PM, Dániel Vörös <daniel.voros@gmail.com>
> > wrote:
> >
> > > Dear All,
> > >
> > > After Bogi has created the 3.0.0 version in Jira I've applied it to a
> > > couple of tickets that don't make sense on the 1.x line (without
> > > Hadoop3/Hive3).
> > >
> > > However, as Bogi has mentioned in her previous email, it probably
> doesn't
> > > make sense to work on a 1.5 release in parallel with 3.0.0. How would
> you
> > > feel if we were to move all 1.5 issues [1] to 3.0.0?
> > >
> > > In the meantime I've experimented with running Sqoop 1.4.7 against
> Hadoop
> > > 3.1.0, and I'm planning to do the opposite, running Sqoop
> 3.0.0-SNAPSHOT
> > > against Hadoop 2.x. That way we'd be able to better assess Attila's
> > > question about backward compatibility. Please note, that the hard part
> > will
> > > be Hive integration I'm afraid, and until there's no Hive 3.0 release
> > it's
> > > hard to test. If anyone's interested in this topic, check out [2].
> > >
> > > Regards,
> > > Daniel
> > >
> > > [1]
> > > https://issues.apache.org/jira/issues?jql=project%20%3D%20SQ
> > > OOP%20and%20fixVersion%20%3D%201.5.0%20and%20resolutionDate%
> > > 20is%20not%20%20empty%20order%20by%20resolutiondate%20desc
> > > [2] https://github.com/dvoros/docker-sqoop
> > >
> > > On Mon, Apr 16, 2018 at 2:20 PM Szabolcs Vasas <vasas@apache.org>
> wrote:
> > >
> > > > Hi All,
> > > >
> > > > Sqoop NG/Sqoop 3:
> > > > As far as I remember Sqoop NG was an alternative name suggested for
> > > Sqoop 2
> > > > which has a totally different architecture than Sqoop 1. I would not
> > use
> > > > now since in this release we do not include changes affecting the
> > > > architecture but bumping the versions of the dependencies. However
> > since
> > > > dependencies are bumped to another major releases I think we should
> > also
> > > > change the major version number of Sqoop.
> > > >
> > > > Hadoop 2 support:
> > > > I agree with Daniel that we should not introduce extra complexity to
> > > > support Hadoop 2 as well. However even if we don't support Hadoop 2
> in
> > > our
> > > > next major Sqoop release some features which do not require Hadoop 3
> > > could
> > > > be backported by the vendors to their earlier releases as well. I
> think
> > > > introducing a 1.x branch upstream would lead to an increased
> complexity
> > > of
> > > > committing bug fixes and I am not sure the community wants to make a
> > > > release in Sqoop 1.x branch. Even if at some point somebody wants to
> do
> > > > this they could cut the branch and cherry-pick the necessary bug
> fixes
> > > > right before the release.
> > > >
> > > > Kite removal:
> > > > I agree that this is quite complex task on its own but we can't bump
> > the
> > > > Hadoop/Hive/HBase dependencies without deciding what to do with Kite.
> > One
> > > > option is to bump these dependencies in Kite too, create a new Kite
> > > release
> > > > and bump Sqoop's Kite dependency to this new release. Another option
> is
> > > to
> > > > get rid of the Kite dependency before we bump Hadoop/Hive/HBase
> > version.
> > > In
> > > > my opinion the latter one makes more sense since we wanted to
> eliminate
> > > the
> > > > Kite dependency anyway and the Kite project seems to be dead so
> bumping
> > > the
> > > > dependencies, making the necessary code changes, fixing tests and
> > > creating
> > > > the release might be an overkill.
> > > >
> > > > Szabolcs
> > > >
> > > > On Mon, Apr 16, 2018 at 11:50 AM, Dániel Vörös <
> daniel.voros@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I believe we're all on the same page on removing Kite, so I've
> opened
> > > > > SQOOP-3313 to track that. @Attila I'm glad to see you're interest
> in
> > > the
> > > > > ORC part. It would be highly appreciated if you could take a look
> at
> > > this
> > > > > review request[1].
> > > > >
> > > > > I'm not that familiar with Flume, but it seems they've added NG
> after
> > > > > architectural changes and released FlumeNG 1.0 after Flume 0.9.4
> [2].
> > > > Even
> > > > > if we go with NG, I'd suggest calling it 3.0, to avoid confusion
> with
> > > > > earlier releases.
> > > > >
> > > > > I think the biggest part of keeping Hadoop 2 (and previous versions
> > of
> > > > > downstream projects like Hive) supported would be testing against
> > > those.
> > > > It
> > > > > would also require at least another build profile to build against
> > > them,
> > > > > and probably another layer of abstraction in the code (like Hadoop
> > > shims
> > > > in
> > > > > Hive).
> > > > > Not sure about vendors, but I think they're usually not adding new
> > > > features
> > > > > to older release lines. In my opinion we should branch off from
> > current
> > > > > trunk to track the 1.x release line (where we keep supporting
> Hadoop
> > 2)
> > > > and
> > > > > keep adding bugfixes there, but add new features to trunk only and
> > > don't
> > > > > worry about Hadoop 2 there.
> > > > >
> > > > > I agree with Attila on the dependencies. We shouldn't release based
> > on
> > > > > non-final releases. We might bump the dependencies to some
> alpha/beta
> > > > > during development, but don't forget to move to the final version
> in
> > > the
> > > > > end.
> > > > >
> > > > > +1 for Bogi as release manager.
> > > > >
> > > > > Regards,
> > > > > Daniel
> > > > >
> > > > > [1] https://reviews.apache.org/r/66548/
> > > > > [2] https://blogs.apache.org/flume/entry/flume_ng_architecture
> > > > >
> > > > > On Fri, Apr 13, 2018 at 5:24 PM Szabó Attila <maugli@inf.elte.hu>
> > > wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > > Hello everyone,
> > > > > >
> > > > > >
> > > > > > I'd like to also attach my thoughts:
> > > > > >
> > > > > >
> > > > > > New Sqoop version: Last time when I'd the chance to talk about
> this
> > > > with
> > > > > > some of the PMC members (e.g. Jarcec, Kate ) we've been on the
> > front
> > > to
> > > > > > create Sqoop-NG (NG == Next Generation), quite the same what
the
> > > Flume
> > > > > > community did (and AFAIK from Mike Percy it's been a quite
> > successful
> > > > act
> > > > > > from their POV). Don't get me wrong, I'm totall NOT against
3.0,
> > > though
> > > > > > IMHO Sqoop-NG 1.0 would be a better choice.
> > > > > >
> > > > > >
> > > > > > Kite: I would totally split this effort into two subtasks. First
> I
> > > > would
> > > > > > get in contact with the Parquet team, and would create a KITE
> > > > independent
> > > > > > execution path in Sqoop for the Parquet backed tables
> > > > (Hive/Impala/etc.).
> > > > > > As a part of this effort I would also add direct support for
ORC
> > > format
> > > > > (in
> > > > > > the past few years I've found it very useful in several different
> > > > > > situation, and usually it's quite inconvenient that Sqoop does
> not
> > > > > support
> > > > > > it "out of the box").
> > > > > >
> > > > > > As the second substask I would start to remove every KITE based
> > > > > dependency
> > > > > > (but according to my gut feeling it could break the codebase
on
> too
> > > > many
> > > > > > places, and might not be that EZ to succeed on that front).
> > > > > >
> > > > > >
> > > > > > Hadoop 2:
> > > > > >
> > > > > > Could anyone please highlight me what would be the pros/cons
on
> > this
> > > > > > front? AFAIK several vendors (including Cloudera, Hortonworks,
> > MapR,
> > > > EMR,
> > > > > > etc.) are still supporting Hadoop 2, and according to my best
> > > knowledge
> > > > > > most of the userbase are connected to their releases, so I'd
like
> > to
> > > > > > provide the chance for those users to use the newest features
of
> > > Sqoop,
> > > > > > thus I would vote for the compatibility for a bit more
> > time/versions.
> > > > > >
> > > > > >
> > > > > > Dependencies:
> > > > > >
> > > > > > I'd like to cast my very direct and LOUD vote against any alpha
> > > > > > dependencies (including HBase or anything else!). IMHO Sqoop
is
> > > > already a
> > > > > > stable component of the Apache Foundation, and the users can
> depend
> > > on
> > > > > it,
> > > > > > thus I'd like to avoid any kind of "immature" dependency related
> > > > issues.
> > > > > Of
> > > > > > course this is also just my solo opinion, but as a community
I
> > think
> > > we
> > > > > > must not undermine our stability.
> > > > > >
> > > > > > On the other fronts I totally agree and +1 with the planned
> > efforts,
> > > > > >
> > > > > > Best regards,
> > > > > > Attila
> > > > > >
> > > > > > ________________________________
> > > > > > From: Szabolcs Vasas <vasas@apache.org>
> > > > > > Sent: Friday, April 13, 2018 3:43 PM
> > > > > > To: dev@sqoop.apache.org
> > > > > > Subject: Re: Release to support Hadoop 3
> > > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I also think that completely eliminating the Kite dependency
from
> > > Sqoop
> > > > > > would be the easiest way of going forward, I will try to analyze
> > this
> > > > > topic
> > > > > > a bit more next week and come up with subtasks so we could work
> on
> > it
> > > > in
> > > > > > parallel potentially.
> > > > > >
> > > > > > I am happy with the Sqoop 3.0 scope proposal too and Bogi being
> the
> > > > > release
> > > > > > manager of it.
> > > > > >
> > > > > > Szabolcs
> > > > > >
> > > > > >
> > > > > > On Fri, Apr 13, 2018 at 2:37 PM, Boglarka Egyed <bogi@apache.org
> >
> > > > wrote:
> > > > > >
> > > > > > > Hi Daniel et al,
> > > > > > >
> > > > > > > Thanks for bringing up this topic and the detailed status
> update.
> > > > > > >
> > > > > > > I am sharing my thoughts point by point, please find them
> below.
> > > > > > >
> > > > > > > 1) How to get a new Kite release? Maybe we should remove
the
> Kite
> > > > > > > > dependency altogether (as Szabolcs hinted in comments
of
> > > > SQOOP-3171)?
> > > > > > >
> > > > > > >
> > > > > > > I think making a new Kite release would be a huge effort
as it
> > > would
> > > > > > > require upgrading the versions, making the necessary code
> > > > > modifications,
> > > > > > > testing it thoroughly, etc. then making the release itself
> > > meanwhile
> > > > > Kite
> > > > > > > is a very passively handled tool having minimal activity
on it
> > thus
> > > > it
> > > > > > > would definitely mean a lot of effort to get it done. It
would
> > > have a
> > > > > > > dependency on Solr community too as the Morphlines module
of
> Kite
> > > is
> > > > > > > heavily used and somewhat actively developed by them. Also
> indeed
> > > > there
> > > > > > is
> > > > > > > a shorter/longer term goal to get rid of Kite dependency
in
> Sqoop
> > > > > > entirely,
> > > > > > > i.e. all release efforts would become throw-away very soon.
> > > > > > >
> > > > > > > Focusing on the Kite removal seems to be more reasonable
to me.
> > > > However
> > > > > > it
> > > > > > > would be great to see an estimation regarding this effort,
> > > @Szabolcs
> > > > > > could
> > > > > > > you maybe share your thoughts on this?
> > > > > > >
> > > > > > > 2) Should we drop support for Hadoop 2?
> > > > > > > >
> > > > > > >
> > > > > > > I think we can drop support for Hadoop 2 especially if
we use
> > > > > > > straightforward versioning with the new release.
> > > > > > >
> > > > > > >
> > > > > > > > 3) What version number should we use? To avoid confusion
with
> > > > Sqoop2
> > > > > > I'd
> > > > > > > go
> > > > > > > > with 3.0.
> > > > > > > >
> > > > > > >
> > > > > > > I like this idea, +1 for making a 3.0 release containing
these
> > > > changes.
> > > > > > >
> > > > > > >
> > > > > > > > 4) Does (should?) this affect the 1.5 release?
> > > > > > >
> > > > > > >
> > > > > > > I think the answer is yes. Currently the following breaking
> > changes
> > > > are
> > > > > > on
> > > > > > > the horizon which could be part of a next Sqoop release:
> > > > > > > * com.cloudera package removal (done)
> > > > > > > * Gradle introduction (in progress)
> > > > > > > * Hadoop/Hive/HBase version upgrade (in progress)
> > > > > > > * Kite deprecation/removal (planned)
> > > > > > > * Bump Java version to 8 (planned )
> > > > > > >
> > > > > > > Looking at this list I would say that making a Sqoop 1.5
> release
> > > > > > containing
> > > > > > > only the com.cloudera package removal, the Gradle introduction
> > and
> > > > the
> > > > > > Java
> > > > > > > version bump would mean a somewhat small and irrelevant
scope
> > from
> > > a
> > > > > user
> > > > > > > perspective so maybe having two releases (1.5 and 3.0)
would
> be a
> > > > > little
> > > > > > > bit overkill. I would instead suggest to go with a Sqoop
3.0
> > > release
> > > > > > > containing all the changes listed above. What do you think?
> > > > > > >
> > > > > > > Summarizing it up I see the following dependencies for
a next
> > Sqoop
> > > > > > release
> > > > > > > currently:
> > > > > > > * Finishing up the Gradle patch
> > > > > > > * Hive 3 release
> > > > > > > * Kite removal - this could be the next common effort in
the
> > > > community
> > > > > > >
> > > > > > > Anyhow I would be happy to take the Release Manager role
for
> the
> > > next
> > > > > > > release, please let me know if everyone would be OK with
that.
> > > > > > >
> > > > > > > I am looking forward to see others thoughts on this too.
> > > > > > >
> > > > > > > Many thanks,
> > > > > > > Bogi
> > > > > > >
> > > > > > > On Thu, Apr 12, 2018 at 5:17 PM, Dániel Vörös <
> > > > daniel.voros@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Dear All,
> > > > > > > >
> > > > > > > > After some development towards supporting Hadoop 3
(and
> latest
> > > > > version
> > > > > > of
> > > > > > > > downstream components) I'd like to summarize the current
> state
> > of
> > > > the
> > > > > > > > upgrade and start the conversation about releasing
a new
> > version
> > > of
> > > > > > Sqoop
> > > > > > > > with Hadoop 3 support.
> > > > > > > >
> > > > > > > > Here's what happened so far:
> > > > > > > >  - Upgraded Hadoop dependency to 3.0.0
> > > > > > > >  - Hive had to be upgraded, since old Hive didn't
work with
> > > Hadoop
> > > > 3.
> > > > > > > >  - HBase had to be upgraded since Hive 3 depends on
HBase
> > > 2(alpha)
> > > > > > > >  - Dealt with a bunch of minor issues like changed
Hadoop
> > > > > configuration
> > > > > > > > names and different packaging of Maven artifacts.
> > > > > > > >
> > > > > > > > For details please refer to this ticket and the attached
> review
> > > > > > request:
> > > > > > > > https://issues.apache.org/jira/browse/SQOOP-3305
> > > > > > > >
> > > > > > > > Remaining work:
> > > > > > > >  - Parquet importing doesn't work. It was broken by
a
> > > > > > > standalone-metastore
> > > > > > > > change in Hive and fixing would require a new Kite
version to
> > be
> > > > > built
> > > > > > > > against Hive 3.
> > > > > > > >  - Hive 3 is going to enable ACID tables by default.
We
> should
> > > > > support
> > > > > > > > importing into these. Details:
> > > > > > > > https://issues.apache.org/jira/browse/SQOOP-3311
> > > > > > > >
> > > > > > > > Other blocking issues:
> > > > > > > >  - There's no Hive 3 release (no alpha/beta) yet.
> > > > > > > >
> > > > > > > > I'd like to kindly ask you all to share any other
> tasks/issues
> > > you
> > > > > know
> > > > > > > of
> > > > > > > > that we should address to support the latest versions.
Also,
> > > there
> > > > > are
> > > > > > a
> > > > > > > > couple open questions:
> > > > > > > >  1) How to get a new Kite release? Maybe we should
remove the
> > > Kite
> > > > > > > > dependency altogether (as Szabolcs hinted in comments
of
> > > > SQOOP-3171)?
> > > > > > > >  2) Should we drop support for Hadoop 2?
> > > > > > > >  3) What version number should we use? To avoid confusion
> with
> > > > Sqoop2
> > > > > > I'd
> > > > > > > > go with 3.0.
> > > > > > > >  4) Does (should?) this affect the 1.5 release?
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Daniel
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message