sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Robson <David.Rob...@software.dell.com>
Subject RE: Discussing solutions to Sqoop1 and Sqoop2 confusion (was: Code name for Sqoop 2)
Date Sat, 02 Aug 2014 01:39:59 GMT
So it seems like the problem we are trying to solve is for a new user, they download Sqoop
1.99.3 - have bad experiences because it is still experimental (based on recent mail threads
this may put them off Sqoop for good). So we should make it as easy as possible to download
the correct version of Sqoop for them.

I believe for a new user - codenames cause more confusion. Assuming a user knew nothing about
Sqoop and was given the choice of Sqoop 1.4.5 or Sqoop Pelican - how would they know which
one to choose? Now if they were given the choice of Sqoop 1.4.5 or Sqoop 1.99.3-alpha - it
would be much more obvious. Of course either way you could put some text on the homepage /
download page explaining the two releases which should be done either way.

I don't think we should add to the confusion by bringing in codenames - and instead stick
with the industry standard alpha / beta / stable terminology as Arvind suggested.

So I would vote on option 2 - and we should put a warning like "not intended for production
deployment" on the link to download Sqoop 1.99.3-alpha.

-----Original Message-----
From: Abraham Elmahrek [mailto:abe@cloudera.com] 
Sent: Saturday, 2 August 2014 6:01 AM
To: dev@sqoop.apache.org
Subject: Re: Discussing solutions to Sqoop1 and Sqoop2 confusion (was: Code name for Sqoop
2)

+1 for proposal 1 as well.


On Fri, Aug 1, 2014 at 11:46 AM, Venkat Ranganathan < vranganathan@hortonworks.com>
wrote:

> +1 for propsal 1 also
>
> Thanks
>
> Venkat
>
> On Fri, Aug 1, 2014 at 9:38 AM, Jarek Jarcec Cecho <jarcec@apache.org>
> wrote:
> > I don’t have any other suggestion either, so let’s discuss which one
> would people prefer?
> >
> > I’m personally in favor of proposal 1).
> >
> > Jarcec
> >
> > On Jul 28, 2014, at 10:04 AM, Gwen Shapira <gshapira@cloudera.com>
> wrote:
> >
> >> Thanks for the great summary. I don't have additional suggestions.
> >>
> >> Gwen
> >>
> >> On Sun, Jul 27, 2014 at 11:03 AM, Arvind Prabhakar 
> >> <arvind@apache.org>
> wrote:
> >>> Thanks Gwen and Jarcec. It appears that we all agree to the few 
> >>> basic points below:
> >>>
> >>> a) Sqoop2 is promising effort although not near completion. We 
> >>> agree
> that
> >>> there is no need to discuss shutting that down at this time.
> >>> b) The naming of Sqoop2 is such that it raises expectations in 
> >>> users/adopters to be better than Sqoop(1) and thus leads to confusion.
> >>>
> >>> The second point (b) above is the key issue that needs resolution. 
> >>> The options discussed thus far are as follows:
> >>>
> >>> 1. Put a code name for Sqoop2 so that it is not confused with Sqoop(1).
> >>> This seems to have good community support.
> >>> 2. Use a explicit labels such as "stable" vs "beta/alpha/experimental"
> for
> >>> various Sqoop releases.
> >>> 3. Use explicit UI messaging to warn Sqoop2 users that it is not 
> >>> the
> same
> >>> as Sqoop(1) and is far behind on feature completeness and stability.
> There
> >>> seems to be some concerns around how this can be done given the 
> >>> client/server architecture of Sqoop2.
> >>> 4. A combination of options 2 and 3 above.
> >>>
> >>> Are there any suggestions to mitigate this problem? Perhaps we 
> >>> should cross-post this thread to user list as well to see if they 
> >>> agree with
> the
> >>> options here and/or if they have any other suggestions.
> >>>
> >>> Regards,
> >>> Arvind Prabhakar
> >>>
> >>>
> >>>
> >>> On Sat, Jul 26, 2014 at 6:50 PM, Jarek Jarcec Cecho 
> >>> <jarcec@apache.org
> >
> >>> wrote:
> >>>
> >>>> Hi Arvind,
> >>>> thank you very much for sharing your concerns! You’ve risen a 
> >>>> very
> good
> >>>> points.
> >>>>
> >>>> I personally see value in Sqoop 2 as the new architecture will 
> >>>> allow
> us to
> >>>> cover much more use cases, various compliancy regulations and 
> >>>> will eventually simplify user’s life. Based on the recent 
> >>>> increase in dev activity, it seems that I’m not the only one who 
> >>>> do believes in that
> and
> >>>> hence I strongly believe that discontinuing the effort doesn’t 
> >>>> seem
> as the
> >>>> right way to go. I’m more then happy to discuss this topic 
> >>>> further if
> you
> >>>> believe that it’s the right way though.
> >>>>
> >>>> Having said that I do believe in Sqoop 2, I have to second Gwen 
> >>>> that current situation is very confusing to our users. I’ve seen
> significant
> >>>> number of users confused about why 1.99.4 do not have 
> >>>> Avro/HBase/Hive integration when Sqoop 1 already have that. I was 
> >>>> anticipating the confusion and hence I’ve suggested to use 
> >>>> version number 1.99.x
> instead of
> >>>> 2.0.0 back when we were working on getting the first cut out of 
> >>>> the
> door. I
> >>>> hoped that version 1.99.x will make obvious to everybody that 
> >>>> it’s not “2.0.0” quite yet. Sadly it seems that this alone did

> >>>> not helped as
> much as
> >>>> I hoped.
> >>>>
> >>>> Hence I do see value in changing our public messaging as you’ve
> suggested
> >>>> to make the message more clearer. I personally like the idea with
> code name
> >>>> as that is quite popular in other projects and companies 
> >>>> (remember
> Windows
> >>>> Longorn?) and it seems to be conveying the message. I do see a 
> >>>> lot of variability of using that code name though - I don’t think

> >>>> that we necessarily have to remove any possible reference to 
> >>>> “Sqoop 2” from
> the
> >>>> face of earth. I believe that the code name is very well suited 
> >>>> for
> our
> >>>> webpage, wiki and perhaps a blog posts to make obvious that there 
> >>>> is
> just
> >>>> one single stable Sqoop version and then some ongoing effort that 
> >>>> do
> have
> >>>> available several cuts. I believe that just by doing that we will
> decrease
> >>>> confusion about what version should user download and use. We can
> discuss
> >>>> to what extent we want to push the code name and to what extent 
> >>>> we
> will
> >>>> keep the reference to “Sqoop 2”. After all I’m confident that
in 
> >>>> not
> too
> >>>> distant future, we will have Sqoop 2  that will offer the 
> >>>> comparable capability and features as Sqoop 1.
> >>>>
> >>>> I don’t think that the code name is the only idea that will 
> >>>> decrease
> the
> >>>> immediate user confusion and hence I’m happy to hear others thoughts!
> >>>>
> >>>> Jarcec
> >>>>
> >>>> On Jul 26, 2014, at 6:00 PM, Gwen Shapira <gshapira@cloudera.com>
> wrote:
> >>>>
> >>>>> Thanks Arvind for your detailed explanation.
> >>>>>
> >>>>> I agree that naming releases stable and alpha is a good idea. I

> >>>>> don't agree that it will solve the issue, but we can't know until
we try.
> >>>>>
> >>>>> Considering that Sqoop2 is intentionally a client-server 
> >>>>> architecture with multiple clients and a REST API as an 
> >>>>> additional access point, I believe that it is not feasible to mark
UI as beta.
> >>>>>
> >>>>> I want to stress that I personally believe that Sqoop2 is a very

> >>>>> viable branch effort, to the extent that I am actively 
> >>>>> contributing
> to
> >>>>> it.
> >>>>> As Sqoop2 becomes more and more viable alternative to Sqoop1, we

> >>>>> need to prepare, as a community, to support both versions.
> >>>>>
> >>>>> Considering the number of features currently in Sqoop1 and the 
> >>>>> number of production Sqoop1 users, I can see us supporting both

> >>>>> versions for the next 2 years. Making it easy for the community

> >>>>> to support both is my top concern here. Having to resolve 
> >>>>> endless confusions for two years doesn't seem like a happy 
> >>>>> future to me. I see the Python community fighting the same issue

> >>>>> since they broke compatibility between versions 2 and 3. I'd 
> >>>>> like to see us learn from those
> mistakes
> >>>>> and do better.
> >>>>>
> >>>>> I agree that a discussion would have been better than a vote. I

> >>>>> was under the mistaken impression that there is a consensus on 
> >>>>> the
> matter.
> >>>>> I renamed the thread to make it clear that we are interested in

> >>>>> hearing opinions from the rest of the community on this subject.
> >>>>>
> >>>>>
> >>>>> Bike-sheddingly yours,
> >>>>>
> >>>>> Gwen Shapira
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Sat, Jul 26, 2014 at 4:44 PM, Arvind Prabhakar 
> >>>>> <arvind@apache.org
> >
> >>>> wrote:
> >>>>>> Thanks for the detailed pointers Gwen. I understand your 
> >>>>>> concerns
> better
> >>>>>> now. My understanding from these threads as well as what you

> >>>>>> have
> >>>> described
> >>>>>> is that the confusion you refer to stems from the fact that

> >>>>>> Sqoop2
> is
> >>>> not
> >>>>>> at feature parity with Sqoop(1) yet.
> >>>>>>
> >>>>>> It will be great to *discuss* what are the various ways to 
> >>>>>> address
> this
> >>>> and
> >>>>>> then call a vote to decide upon the approach to use. Some other
> >>>> approaches
> >>>>>> that I can suggest are:
> >>>>>>
> >>>>>> 1. Calling Sqoop1 explicitly as "stable" in our downloads 
> >>>>>> section,
> or
> >>>> even
> >>>>>> within the release label. So instead of Sqoop-1.4.5, it would

> >>>>>> be Sqoop-1.4.5-stable.
> >>>>>>
> >>>>>> 2. Alternatively calling Sqoop2 explicitly "alpha", "beta" or

> >>>>>> "experimental". Eg - Sqoop-1.99.4 would become Sqoop-1.99.4-beta.
> >>>>>>
> >>>>>> 3. Or perhaps a combination of both of these.
> >>>>>>
> >>>>>> 4. Plus good UI messaging that clearly outlines the critical
> differences
> >>>>>> between these to products.
> >>>>>>
> >>>>>> Personally, I do not believe that having a code name will solve

> >>>>>> the
> >>>> issue
> >>>>>> at hand, and may even make it worse. If the motivation is to

> >>>>>> call
> out
> >>>>>> Sqoop2 as something "not-Sqoop", then perhaps we should discuss

> >>>>>> the viability of this branch effort. If we conclude that it
is 
> >>>>>> not
> going to
> >>>>>> progress any further, we could call a vote on discontinuing

> >>>>>> this
> effort
> >>>> and
> >>>>>> instead focusing on the main Sqoop1 branch alone.
> >>>>>>
> >>>>>> Hope you understand my point of view on this.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Arvind Prabhakar
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Fri, Jul 25, 2014 at 10:53 AM, Gwen Shapira <
> gshapira@cloudera.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi Arvind,
> >>>>>>>
> >>>>>>> Here are few more threads from the last month where we had
to
> explain
> >>>>>>> Sqoop2 status or explain that you can't use "sqoop import"

> >>>>>>> with Sqoop2, etc:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>
> http://mail-archives.apache.org/mod_mbox/sqoop-user/201407.mbox/%3CCA%
> 2BP7NPNTFuPYqf74M5OFw4e9xKZm2ns%3DZ0ydkkuJ06Jcg31hnw%40mail.gmail.com%
> 3E
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>
> http://mail-archives.apache.org/mod_mbox/sqoop-user/201407.mbox/%3CCAA
> J8D%3D9Ho%3DYH7jdavNAb1gwz19Z5dapmS98yR71KmM5LsjCEVw%40mail.gmail.com%
> 3E
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>
> http://mail-archives.apache.org/mod_mbox/sqoop-user/201407.mbox/%3CCAP
> wc21YtdgAm9jO3%2Bs0asBZ2JkG%3DVCp5PLpkO5xQuuBPKQGuTw%40mail.gmail.com%
> 3E
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>
> http://mail-archives.apache.org/mod_mbox/sqoop-user/201406.mbox/%3CCAO
> rS3pxWuxL1X9Sb816N_o1Jd==gs9Ww6UjE2PO+FPaw7VHw1Q@mail.gmail.com%3E
> >>>>>>>
> >>>>>>> In addition, I noticed the problem when talking to users
in 
> >>>>>>> conferences, customers, members of support team, etc (not
to
> mention
> >>>>>>> that I got confused personally when I started out.) I didn't

> >>>>>>> bring much evidence in my first email because I thought
> there
> >>>>>>> was a wide consensus about the problem.
> >>>>>>>
> >>>>>>> I have several goals with the code-name:
> >>>>>>>
> >>>>>>> * We need to remove the impression that the new version
is 
> >>>>>>> like
> Sqoop
> >>>>>>> only better. It is only somewhat like Sqoop and will not
be
> strictly
> >>>>>>> better for many months yet.
> >>>>>>> * We need to clarify that this project is not even close
to
> production
> >>>>>>> quality.
> >>>>>>> * We need to make it easy for us to quickly figure out which
> version
> >>>>>>> the user is talking about. We also need to make it easy
for 
> >>>>>>> the
> users
> >>>>>>> to describe what they are using.
> >>>>>>> * We want to have fun :)
> >>>>>>>
> >>>>>>> I think the name Pelican Project will help with all goals:
> >>>>>>> - It is clearly not the same as Sqoop. So there's no existing

> >>>>>>> expectations on what will be supported.
> >>>>>>> - It is a "Project" and not a product yet.
> >>>>>>> - Sqoop and Pelican don't look or sound similar. No one
can 
> >>>>>>> expect
> to
> >>>>>>> use Sqoop by running "pelican-shell" or to use Pelican by

> >>>>>>> calling "sqoop import".
> >>>>>>> - And a cute mascot will make every future presentation
and 
> >>>>>>> blog
> post
> >>>>>>> on the topic much more fun.
> >>>>>>>
> >>>>>>> You do bring up good points of concern:
> >>>>>>>
> >>>>>>> a) Existing releases: I disagree code-names for in-progress

> >>>>>>> development cause too much confusion. They seem fairly common

> >>>>>>> in
> the
> >>>>>>> software world.
> >>>>>>>
> >>>>>>>
> >>>>
> http://royal.pingdom.com/2010/05/27/the-developer-obsession-with-code-
> names-114-interesting-examples/
> >>>>>>>
> >>>>>>> b) "could impact the reproducibility of previous release

> >>>>>>> builds
> which
> >>>>>>> is not very good for the project."
> >>>>>>> This sounds fairly serious. Can you elaborate what you mean
by 
> >>>>>>> reproducibility of release build?
> >>>>>>>
> >>>>>>> Gwen
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Fri, Jul 25, 2014 at 8:02 AM, Arvind Prabhakar <
> arvind@apache.org>
> >>>>>>> wrote:
> >>>>>>>> Hi Gwen,
> >>>>>>>>
> >>>>>>>> Other than the recent thread [1] on our user list, is
there 
> >>>>>>>> any
> other
> >>>>>>>> precedent regarding the confusion this issue has caused?
If 
> >>>>>>>> so, I
> >>>> would
> >>>>>>>> appreciate if you could point it out.
> >>>>>>>>
> >>>>>>>> Personally, I do agree that we ought to have a better

> >>>>>>>> mechanism to communicate the completeness (or incompleteness)

> >>>>>>>> of a release in
> >>>> order to
> >>>>>>>> ensure the users understand what benefits or drawbacks
they 
> >>>>>>>> may
> get.
> >>>>>>>> Incidentally, this was the primary reason for numbering
the 
> >>>>>>>> Sqoop2
> >>>>>>> release
> >>>>>>>> as 1.99.x, thereby indicating that the release is not
quite 
> >>>>>>>> 2.0
> yet,
> >>>>>>> which
> >>>>>>>> seems to be not working as well as expected.
> >>>>>>>>
> >>>>>>>> One traditional way to alleviate this issue would be
to label 
> >>>>>>>> the
> >>>> release
> >>>>>>>> alpha/beta etc. I prefer doing that instead of putting
a code
> name for
> >>>>>>> the
> >>>>>>>> release for a couple of reasons - a) we have already
made
> releases of
> >>>>>>>> Sqoop2 with the previous versioning scheme and hence
changing 
> >>>>>>>> the
> name
> >>>>>>>> could cause more confusion; and b) renaming the branches
to 
> >>>>>>>> the
> new
> >>>> name
> >>>>>>>> could impact the reproducibility of previous release
builds 
> >>>>>>>> which
> is
> >>>> not
> >>>>>>>> very good for the project.
> >>>>>>>>
> >>>>>>>> Another alternative to consider would be to have very
clear
> messaging
> >>>> in
> >>>>>>>> the user-interface of Sqoop2 that it is still work in

> >>>>>>>> progress
> and not
> >>>>>>>> considered at par with Sqoop1.
> >>>>>>>>
> >>>>>>>> [1] http://s.apache.org/TvD
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Arvind Prabhakar
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Fri, Jul 25, 2014 at 7:30 AM, Venkat Ranganathan
< 
> >>>>>>>> vranganathan@hortonworks.com> wrote:
> >>>>>>>>
> >>>>>>>>> +1 for Pelican.   But documentation should not be
called The
> Pelican
> >>>>>>> Brief
> >>>>>>>>> :)
> >>>>>>>>>
> >>>>>>>>> Venkat
> >>>>>>>>>
> >>>>>>>>> On Thu, Jul 24, 2014 at 8:12 PM, Abraham Elmahrek
<
> abe@cloudera.com>
> >>>>>>>>> wrote:
> >>>>>>>>>> There's something about schlep (or schlepper)
that I'm 
> >>>>>>>>>> having
> >>>> trouble
> >>>>>>>>>> resisting... but... +1 to Pelican.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Thu, Jul 24, 2014 at 7:18 PM, Jarek Jarcec
Cecho <
> >>>>>>> jarcec@apache.org>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> I’m obviously biased, but +1 to Pelican.
> >>>>>>>>>>>
> >>>>>>>>>>> Jarcec
> >>>>>>>>>>>
> >>>>>>>>>>> On Jul 24, 2014, at 7:06 PM, Martin, Nick

> >>>>>>>>>>> <NiMartin@pssd.com>
> >>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> +1 Pelican
> >>>>>>>>>>>>
> >>>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>>> From: Gwen Shapira [mailto:gshapira@cloudera.com]
> >>>>>>>>>>>> Sent: Thursday, July 24, 2014 9:51 PM
> >>>>>>>>>>>> To: dev@sqoop.apache.org
> >>>>>>>>>>>> Subject: Code name for Sqoop 2 (please
vote!)
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hi,
> >>>>>>>>>>>>
> >>>>>>>>>>>> As you may have noticed on the user
list, Sqoop2 confuses 
> >>>>>>>>>>>> the
> hell
> >>>>>>> out
> >>>>>>>>>>> of everyone.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Part of the problem is the name - Sqoop2
sounds newer and
> >>>> therefore
> >>>>>>>>>>> better. People expect better quality and
more features - 
> >>>>>>>>>>> which
> we
> >>>>>>> don't
> >>>>>>>>>>> deliver :(
> >>>>>>>>>>>>
> >>>>>>>>>>>> Therefore, I propose finding Sqoop2
a project code name. 
> >>>>>>>>>>>> This
> way
> >>>>>>> it
> >>>>>>>>>>> will sound experimental and will not have
the number "2" 
> >>>>>>>>>>> next
> to
> >>>> it.
> >>>>>>>>>>>> We can use the code name to mark the
branches in the 
> >>>>>>>>>>>> repo, the
> >>>>>>>>>>> documentation, the Hue frontend, etc. This
will prevent
> confusion
> >>>> as
> >>>>>>> the
> >>>>>>>>>>> name Sqoop will go back to refer to just
one project, and 
> >>>>>>>>>>> one
> that
> >>>>>>>>> actually
> >>>>>>>>>>> works.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Suggested names:
> >>>>>>>>>>>> Project Pelican (Based on the animal
on O'Reilly's Sqoop 
> >>>>>>>>>>>> book)
> >>>>>>> Project
> >>>>>>>>>>> Schlep (Yiddish for "moving heavy package")
> >>>>>>>>>>>>
> >>>>>>>>>>>> Friends, contributors, committers and
PMC members - 
> >>>>>>>>>>>> please
> respond
> >>>>>>>>> with
> >>>>>>>>>>> either:
> >>>>>>>>>>>> * Vote (+1) on one of the names above
> >>>>>>>>>>>> * Your own suggestion
> >>>>>>>>>>>>
> >>>>>>>>>>>> We'll be looking to close the vote by
August 1st (Next week).
> >>>>>>>>>>>>
> >>>>>>>>>>>> Gwen
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> CONFIDENTIALITY NOTICE
> >>>>>>>>> NOTICE: This message is intended for the use of
the 
> >>>>>>>>> individual or
> >>>>>>> entity to
> >>>>>>>>> which it is addressed and may contain information
that is
> >>>> confidential,
> >>>>>>>>> privileged and exempt from disclosure under applicable
law. 
> >>>>>>>>> If
> the
> >>>>>>> reader
> >>>>>>>>> of this message is not the intended recipient, you
are 
> >>>>>>>>> hereby
> >>>> notified
> >>>>>>> that
> >>>>>>>>> any printing, copying, dissemination, distribution,

> >>>>>>>>> disclosure or forwarding of this communication is
strictly 
> >>>>>>>>> prohibited. If you
> have
> >>>>>>>>> received this communication in error, please contact
the 
> >>>>>>>>> sender
> >>>>>>> immediately
> >>>>>>>>> and delete it from your system. Thank You.
> >>>>>>>>>
> >>>>>>>
> >>>>
> >>>>
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or 
> entity to which it is addressed and may contain information that is 
> confidential, privileged and exempt from disclosure under applicable 
> law. If the reader of this message is not the intended recipient, you 
> are hereby notified that any printing, copying, dissemination, 
> distribution, disclosure or forwarding of this communication is 
> strictly prohibited. If you have received this communication in error, 
> please contact the sender immediately and delete it from your system. Thank You.
>
Mime
View raw message