sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Attila Szabó <mau...@apache.org>
Subject Fwd: Release to support Hadoop 3
Date Fri, 11 May 2018 00:51:12 GMT
Dear Sqoop community,

Am I the only one who is missing some formal decision making, announcement
and process here?

 - When did the PMC made decision about that we're going for version 3.0
(instead of any other version alternatives)? When and where was it
announced? How is it possible that some of the contributors know this fact
earlier then the rests of the community?

 - When did Bogi become a PMC member, and why was that fact not announced
to the community as it used to be (and still not yet visible on the project
site)? Of course this is just an assumption, but according to some PMC
chair emails back this February: JIRA administrator privileges are only PMC
members, and I guess we still follow this rule, as no other information was
announced, thus I guess the reason why Bogi was able to administrate the
available versions in the JIRA means she has been lifted  (and here I'd
like to send my congrats, if this is true, and my assumption was valid).

 - When did the PMC made decision about dropping Hadoop 2 compatibility?
When and where was it announced?

If as a community which is officially part of the ASF we do have rules, why
do we look like not following them?

Regards,
Attila

ps:
Objections against dropping 1.5, is clearly the fact that having 1.5 was
decided as a community, and yet we're still not sure if those changes would
be only delivered in a next major version, how they would be backported,
cherry-picked, etc. And as the majority of the users are still on 2.x, I
think we cannot force ppl to upgrade, just to being able to use some of the
originally 1.5 planned changes.
So this item is absolutely -1 on my side!

ps2:
Daniel! I've provided some comments on your ORC Jira.

On Thu, May 10, 2018 at 4:00 PM, Boglarka Egyed <bogi@apache.org> wrote:

> Hi All,
>
> Thank you Daniel for the update! I was also writing one when your email
> arrived so I'm just adding a couple of comments to that.
>
> New major version in JIRA:
> Version 3.0.0 has been created in JIRA
> <https://issues.apache.org/jira/projects/SQOOP/summary>, please feel free
> to use it on the corresponding JIRAs from now. As per my previous email I
> see no point in doing an 1.5.0 release currently so I'm OK with moving all
> the JIRAs having fix/target version of 1.5.0 to 3.0.0. Any objections?
>
> Update on the dependencies of the release:
> * Gradle patch needs some finalization and can be committed soon:
> https://reviews.apache.org/r/66067/
> * Kite removal effort has been started: SQOOP-3313
> <https://issues.apache.org/jira/browse/SQOOP-3313>
> * Hive 3.0.0 release is still in an early phase based on this email thread
> <https://mail-archives.apache.org/mod_mbox/hive-dev/201804.m
> box/%3C2EC60DA6-0A2E-4F3A-92F2-E3CE9D49762A@hortonworks.com%3E>
> and has no ETA yet
>
> Thanks Daniel for looking into the Hadoop compatibility question, please
> let us know your findings.
>
> Cheers,
> Bogi
>
>
>
> On Thu, May 10, 2018 at 3:27 PM, Dániel Vörös <daniel.voros@gmail.com>
> wrote:
>
> > Dear All,
> >
> > After Bogi has created the 3.0.0 version in Jira I've applied it to a
> > couple of tickets that don't make sense on the 1.x line (without
> > Hadoop3/Hive3).
> >
> > However, as Bogi has mentioned in her previous email, it probably doesn't
> > make sense to work on a 1.5 release in parallel with 3.0.0. How would you
> > feel if we were to move all 1.5 issues [1] to 3.0.0?
> >
> > In the meantime I've experimented with running Sqoop 1.4.7 against Hadoop
> > 3.1.0, and I'm planning to do the opposite, running Sqoop 3.0.0-SNAPSHOT
> > against Hadoop 2.x. That way we'd be able to better assess Attila's
> > question about backward compatibility. Please note, that the hard part
> will
> > be Hive integration I'm afraid, and until there's no Hive 3.0 release
> it's
> > hard to test. If anyone's interested in this topic, check out [2].
> >
> > Regards,
> > Daniel
> >
> > [1]
> > https://issues.apache.org/jira/issues?jql=project%20%3D%20SQ
> > OOP%20and%20fixVersion%20%3D%201.5.0%20and%20resolutionDate%
> > 20is%20not%20%20empty%20order%20by%20resolutiondate%20desc
> > [2] https://github.com/dvoros/docker-sqoop
> >
> > On Mon, Apr 16, 2018 at 2:20 PM Szabolcs Vasas <vasas@apache.org> wrote:
> >
> > > Hi All,
> > >
> > > Sqoop NG/Sqoop 3:
> > > As far as I remember Sqoop NG was an alternative name suggested for
> > Sqoop 2
> > > which has a totally different architecture than Sqoop 1. I would not
> use
> > > now since in this release we do not include changes affecting the
> > > architecture but bumping the versions of the dependencies. However
> since
> > > dependencies are bumped to another major releases I think we should
> also
> > > change the major version number of Sqoop.
> > >
> > > Hadoop 2 support:
> > > I agree with Daniel that we should not introduce extra complexity to
> > > support Hadoop 2 as well. However even if we don't support Hadoop 2 in
> > our
> > > next major Sqoop release some features which do not require Hadoop 3
> > could
> > > be backported by the vendors to their earlier releases as well. I think
> > > introducing a 1.x branch upstream would lead to an increased complexity
> > of
> > > committing bug fixes and I am not sure the community wants to make a
> > > release in Sqoop 1.x branch. Even if at some point somebody wants to do
> > > this they could cut the branch and cherry-pick the necessary bug fixes
> > > right before the release.
> > >
> > > Kite removal:
> > > I agree that this is quite complex task on its own but we can't bump
> the
> > > Hadoop/Hive/HBase dependencies without deciding what to do with Kite.
> One
> > > option is to bump these dependencies in Kite too, create a new Kite
> > release
> > > and bump Sqoop's Kite dependency to this new release. Another option is
> > to
> > > get rid of the Kite dependency before we bump Hadoop/Hive/HBase
> version.
> > In
> > > my opinion the latter one makes more sense since we wanted to eliminate
> > the
> > > Kite dependency anyway and the Kite project seems to be dead so bumping
> > the
> > > dependencies, making the necessary code changes, fixing tests and
> > creating
> > > the release might be an overkill.
> > >
> > > Szabolcs
> > >
> > > On Mon, Apr 16, 2018 at 11:50 AM, Dániel Vörös <daniel.voros@gmail.com
> >
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I believe we're all on the same page on removing Kite, so I've opened
> > > > SQOOP-3313 to track that. @Attila I'm glad to see you're interest in
> > the
> > > > ORC part. It would be highly appreciated if you could take a look at
> > this
> > > > review request[1].
> > > >
> > > > I'm not that familiar with Flume, but it seems they've added NG after
> > > > architectural changes and released FlumeNG 1.0 after Flume 0.9.4 [2].
> > > Even
> > > > if we go with NG, I'd suggest calling it 3.0, to avoid confusion with
> > > > earlier releases.
> > > >
> > > > I think the biggest part of keeping Hadoop 2 (and previous versions
> of
> > > > downstream projects like Hive) supported would be testing against
> > those.
> > > It
> > > > would also require at least another build profile to build against
> > them,
> > > > and probably another layer of abstraction in the code (like Hadoop
> > shims
> > > in
> > > > Hive).
> > > > Not sure about vendors, but I think they're usually not adding new
> > > features
> > > > to older release lines. In my opinion we should branch off from
> current
> > > > trunk to track the 1.x release line (where we keep supporting Hadoop
> 2)
> > > and
> > > > keep adding bugfixes there, but add new features to trunk only and
> > don't
> > > > worry about Hadoop 2 there.
> > > >
> > > > I agree with Attila on the dependencies. We shouldn't release based
> on
> > > > non-final releases. We might bump the dependencies to some alpha/beta
> > > > during development, but don't forget to move to the final version in
> > the
> > > > end.
> > > >
> > > > +1 for Bogi as release manager.
> > > >
> > > > Regards,
> > > > Daniel
> > > >
> > > > [1] https://reviews.apache.org/r/66548/
> > > > [2] https://blogs.apache.org/flume/entry/flume_ng_architecture
> > > >
> > > > On Fri, Apr 13, 2018 at 5:24 PM Szabó Attila <maugli@inf.elte.hu>
> > wrote:
> > > >
> > > > >
> > > > >
> > > > > Hello everyone,
> > > > >
> > > > >
> > > > > I'd like to also attach my thoughts:
> > > > >
> > > > >
> > > > > New Sqoop version: Last time when I'd the chance to talk about this
> > > with
> > > > > some of the PMC members (e.g. Jarcec, Kate ) we've been on the
> front
> > to
> > > > > create Sqoop-NG (NG == Next Generation), quite the same what the
> > Flume
> > > > > community did (and AFAIK from Mike Percy it's been a quite
> successful
> > > act
> > > > > from their POV). Don't get me wrong, I'm totall NOT against 3.0,
> > though
> > > > > IMHO Sqoop-NG 1.0 would be a better choice.
> > > > >
> > > > >
> > > > > Kite: I would totally split this effort into two subtasks. First
I
> > > would
> > > > > get in contact with the Parquet team, and would create a KITE
> > > independent
> > > > > execution path in Sqoop for the Parquet backed tables
> > > (Hive/Impala/etc.).
> > > > > As a part of this effort I would also add direct support for ORC
> > format
> > > > (in
> > > > > the past few years I've found it very useful in several different
> > > > > situation, and usually it's quite inconvenient that Sqoop does not
> > > > support
> > > > > it "out of the box").
> > > > >
> > > > > As the second substask I would start to remove every KITE based
> > > > dependency
> > > > > (but according to my gut feeling it could break the codebase on too
> > > many
> > > > > places, and might not be that EZ to succeed on that front).
> > > > >
> > > > >
> > > > > Hadoop 2:
> > > > >
> > > > > Could anyone please highlight me what would be the pros/cons on
> this
> > > > > front? AFAIK several vendors (including Cloudera, Hortonworks,
> MapR,
> > > EMR,
> > > > > etc.) are still supporting Hadoop 2, and according to my best
> > knowledge
> > > > > most of the userbase are connected to their releases, so I'd like
> to
> > > > > provide the chance for those users to use the newest features of
> > Sqoop,
> > > > > thus I would vote for the compatibility for a bit more
> time/versions.
> > > > >
> > > > >
> > > > > Dependencies:
> > > > >
> > > > > I'd like to cast my very direct and LOUD vote against any alpha
> > > > > dependencies (including HBase or anything else!). IMHO Sqoop is
> > > already a
> > > > > stable component of the Apache Foundation, and the users can depend
> > on
> > > > it,
> > > > > thus I'd like to avoid any kind of "immature" dependency related
> > > issues.
> > > > Of
> > > > > course this is also just my solo opinion, but as a community I
> think
> > we
> > > > > must not undermine our stability.
> > > > >
> > > > > On the other fronts I totally agree and +1 with the planned
> efforts,
> > > > >
> > > > > Best regards,
> > > > > Attila
> > > > >
> > > > > ________________________________
> > > > > From: Szabolcs Vasas <vasas@apache.org>
> > > > > Sent: Friday, April 13, 2018 3:43 PM
> > > > > To: dev@sqoop.apache.org
> > > > > Subject: Re: Release to support Hadoop 3
> > > > >
> > > > > Hi all,
> > > > >
> > > > > I also think that completely eliminating the Kite dependency from
> > Sqoop
> > > > > would be the easiest way of going forward, I will try to analyze
> this
> > > > topic
> > > > > a bit more next week and come up with subtasks so we could work on
> it
> > > in
> > > > > parallel potentially.
> > > > >
> > > > > I am happy with the Sqoop 3.0 scope proposal too and Bogi being the
> > > > release
> > > > > manager of it.
> > > > >
> > > > > Szabolcs
> > > > >
> > > > >
> > > > > On Fri, Apr 13, 2018 at 2:37 PM, Boglarka Egyed <bogi@apache.org>
> > > wrote:
> > > > >
> > > > > > Hi Daniel et al,
> > > > > >
> > > > > > Thanks for bringing up this topic and the detailed status update.
> > > > > >
> > > > > > I am sharing my thoughts point by point, please find them below.
> > > > > >
> > > > > > 1) How to get a new Kite release? Maybe we should remove the
Kite
> > > > > > > dependency altogether (as Szabolcs hinted in comments of
> > > SQOOP-3171)?
> > > > > >
> > > > > >
> > > > > > I think making a new Kite release would be a huge effort as
it
> > would
> > > > > > require upgrading the versions, making the necessary code
> > > > modifications,
> > > > > > testing it thoroughly, etc. then making the release itself
> > meanwhile
> > > > Kite
> > > > > > is a very passively handled tool having minimal activity on
it
> thus
> > > it
> > > > > > would definitely mean a lot of effort to get it done. It would
> > have a
> > > > > > dependency on Solr community too as the Morphlines module of
Kite
> > is
> > > > > > heavily used and somewhat actively developed by them. Also indeed
> > > there
> > > > > is
> > > > > > a shorter/longer term goal to get rid of Kite dependency in
Sqoop
> > > > > entirely,
> > > > > > i.e. all release efforts would become throw-away very soon.
> > > > > >
> > > > > > Focusing on the Kite removal seems to be more reasonable to
me.
> > > However
> > > > > it
> > > > > > would be great to see an estimation regarding this effort,
> > @Szabolcs
> > > > > could
> > > > > > you maybe share your thoughts on this?
> > > > > >
> > > > > > 2) Should we drop support for Hadoop 2?
> > > > > > >
> > > > > >
> > > > > > I think we can drop support for Hadoop 2 especially if we use
> > > > > > straightforward versioning with the new release.
> > > > > >
> > > > > >
> > > > > > > 3) What version number should we use? To avoid confusion
with
> > > Sqoop2
> > > > > I'd
> > > > > > go
> > > > > > > with 3.0.
> > > > > > >
> > > > > >
> > > > > > I like this idea, +1 for making a 3.0 release containing these
> > > changes.
> > > > > >
> > > > > >
> > > > > > > 4) Does (should?) this affect the 1.5 release?
> > > > > >
> > > > > >
> > > > > > I think the answer is yes. Currently the following breaking
> changes
> > > are
> > > > > on
> > > > > > the horizon which could be part of a next Sqoop release:
> > > > > > * com.cloudera package removal (done)
> > > > > > * Gradle introduction (in progress)
> > > > > > * Hadoop/Hive/HBase version upgrade (in progress)
> > > > > > * Kite deprecation/removal (planned)
> > > > > > * Bump Java version to 8 (planned )
> > > > > >
> > > > > > Looking at this list I would say that making a Sqoop 1.5 release
> > > > > containing
> > > > > > only the com.cloudera package removal, the Gradle introduction
> and
> > > the
> > > > > Java
> > > > > > version bump would mean a somewhat small and irrelevant scope
> from
> > a
> > > > user
> > > > > > perspective so maybe having two releases (1.5 and 3.0) would
be a
> > > > little
> > > > > > bit overkill. I would instead suggest to go with a Sqoop 3.0
> > release
> > > > > > containing all the changes listed above. What do you think?
> > > > > >
> > > > > > Summarizing it up I see the following dependencies for a next
> Sqoop
> > > > > release
> > > > > > currently:
> > > > > > * Finishing up the Gradle patch
> > > > > > * Hive 3 release
> > > > > > * Kite removal - this could be the next common effort in the
> > > community
> > > > > >
> > > > > > Anyhow I would be happy to take the Release Manager role for
the
> > next
> > > > > > release, please let me know if everyone would be OK with that.
> > > > > >
> > > > > > I am looking forward to see others thoughts on this too.
> > > > > >
> > > > > > Many thanks,
> > > > > > Bogi
> > > > > >
> > > > > > On Thu, Apr 12, 2018 at 5:17 PM, Dániel Vörös <
> > > daniel.voros@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Dear All,
> > > > > > >
> > > > > > > After some development towards supporting Hadoop 3 (and
latest
> > > > version
> > > > > of
> > > > > > > downstream components) I'd like to summarize the current
state
> of
> > > the
> > > > > > > upgrade and start the conversation about releasing a new
> version
> > of
> > > > > Sqoop
> > > > > > > with Hadoop 3 support.
> > > > > > >
> > > > > > > Here's what happened so far:
> > > > > > >  - Upgraded Hadoop dependency to 3.0.0
> > > > > > >  - Hive had to be upgraded, since old Hive didn't work
with
> > Hadoop
> > > 3.
> > > > > > >  - HBase had to be upgraded since Hive 3 depends on HBase
> > 2(alpha)
> > > > > > >  - Dealt with a bunch of minor issues like changed Hadoop
> > > > configuration
> > > > > > > names and different packaging of Maven artifacts.
> > > > > > >
> > > > > > > For details please refer to this ticket and the attached
review
> > > > > request:
> > > > > > > https://issues.apache.org/jira/browse/SQOOP-3305
> > > > > > >
> > > > > > > Remaining work:
> > > > > > >  - Parquet importing doesn't work. It was broken by a
> > > > > > standalone-metastore
> > > > > > > change in Hive and fixing would require a new Kite version
to
> be
> > > > built
> > > > > > > against Hive 3.
> > > > > > >  - Hive 3 is going to enable ACID tables by default. We
should
> > > > support
> > > > > > > importing into these. Details:
> > > > > > > https://issues.apache.org/jira/browse/SQOOP-3311
> > > > > > >
> > > > > > > Other blocking issues:
> > > > > > >  - There's no Hive 3 release (no alpha/beta) yet.
> > > > > > >
> > > > > > > I'd like to kindly ask you all to share any other tasks/issues
> > you
> > > > know
> > > > > > of
> > > > > > > that we should address to support the latest versions.
Also,
> > there
> > > > are
> > > > > a
> > > > > > > couple open questions:
> > > > > > >  1) How to get a new Kite release? Maybe we should remove
the
> > Kite
> > > > > > > dependency altogether (as Szabolcs hinted in comments of
> > > SQOOP-3171)?
> > > > > > >  2) Should we drop support for Hadoop 2?
> > > > > > >  3) What version number should we use? To avoid confusion
with
> > > Sqoop2
> > > > > I'd
> > > > > > > go with 3.0.
> > > > > > >  4) Does (should?) this affect the 1.5 release?
> > > > > > >
> > > > > > > Regards,
> > > > > > > Daniel
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message