sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jarek Jarcec Cecho <jar...@apache.org>
Subject Re: Discussing solutions to Sqoop1 and Sqoop2 confusion (was: Code name for Sqoop 2)
Date Fri, 01 Aug 2014 16:38:00 GMT
I don’t have any other suggestion either, so let’s discuss which one would people prefer?

I’m personally in favor of proposal 1).

Jarcec

On Jul 28, 2014, at 10:04 AM, Gwen Shapira <gshapira@cloudera.com> wrote:

> Thanks for the great summary. I don't have additional suggestions.
> 
> Gwen
> 
> On Sun, Jul 27, 2014 at 11:03 AM, Arvind Prabhakar <arvind@apache.org> wrote:
>> Thanks Gwen and Jarcec. It appears that we all agree to the few basic
>> points below:
>> 
>> a) Sqoop2 is promising effort although not near completion. We agree that
>> there is no need to discuss shutting that down at this time.
>> b) The naming of Sqoop2 is such that it raises expectations in
>> users/adopters to be better than Sqoop(1) and thus leads to confusion.
>> 
>> The second point (b) above is the key issue that needs resolution. The
>> options discussed thus far are as follows:
>> 
>> 1. Put a code name for Sqoop2 so that it is not confused with Sqoop(1).
>> This seems to have good community support.
>> 2. Use a explicit labels such as "stable" vs "beta/alpha/experimental" for
>> various Sqoop releases.
>> 3. Use explicit UI messaging to warn Sqoop2 users that it is not the same
>> as Sqoop(1) and is far behind on feature completeness and stability. There
>> seems to be some concerns around how this can be done given the
>> client/server architecture of Sqoop2.
>> 4. A combination of options 2 and 3 above.
>> 
>> Are there any suggestions to mitigate this problem? Perhaps we should
>> cross-post this thread to user list as well to see if they agree with the
>> options here and/or if they have any other suggestions.
>> 
>> Regards,
>> Arvind Prabhakar
>> 
>> 
>> 
>> On Sat, Jul 26, 2014 at 6:50 PM, Jarek Jarcec Cecho <jarcec@apache.org>
>> wrote:
>> 
>>> Hi Arvind,
>>> thank you very much for sharing your concerns! You’ve risen a very good
>>> points.
>>> 
>>> I personally see value in Sqoop 2 as the new architecture will allow us to
>>> cover much more use cases, various compliancy regulations and will
>>> eventually simplify user’s life. Based on the recent increase in dev
>>> activity, it seems that I’m not the only one who do believes in that and
>>> hence I strongly believe that discontinuing the effort doesn’t seem as the
>>> right way to go. I’m more then happy to discuss this topic further if you
>>> believe that it’s the right way though.
>>> 
>>> Having said that I do believe in Sqoop 2, I have to second Gwen that
>>> current situation is very confusing to our users. I’ve seen significant
>>> number of users confused about why 1.99.4 do not have Avro/HBase/Hive
>>> integration when Sqoop 1 already have that. I was anticipating the
>>> confusion and hence I’ve suggested to use version number 1.99.x instead of
>>> 2.0.0 back when we were working on getting the first cut out of the door. I
>>> hoped that version 1.99.x will make obvious to everybody that it’s not
>>> “2.0.0” quite yet. Sadly it seems that this alone did not helped as much
as
>>> I hoped.
>>> 
>>> Hence I do see value in changing our public messaging as you’ve suggested
>>> to make the message more clearer. I personally like the idea with code name
>>> as that is quite popular in other projects and companies (remember Windows
>>> Longorn?) and it seems to be conveying the message. I do see a lot of
>>> variability of using that code name though - I don’t think that we
>>> necessarily have to remove any possible reference to “Sqoop 2” from the
>>> face of earth. I believe that the code name is very well suited for our
>>> webpage, wiki and perhaps a blog posts to make obvious that there is just
>>> one single stable Sqoop version and then some ongoing effort that do have
>>> available several cuts. I believe that just by doing that we will decrease
>>> confusion about what version should user download and use. We can discuss
>>> to what extent we want to push the code name and to what extent we will
>>> keep the reference to “Sqoop 2”. After all I’m confident that in not too
>>> distant future, we will have Sqoop 2  that will offer the comparable
>>> capability and features as Sqoop 1.
>>> 
>>> I don’t think that the code name is the only idea that will decrease the
>>> immediate user confusion and hence I’m happy to hear others thoughts!
>>> 
>>> Jarcec
>>> 
>>> On Jul 26, 2014, at 6:00 PM, Gwen Shapira <gshapira@cloudera.com> wrote:
>>> 
>>>> Thanks Arvind for your detailed explanation.
>>>> 
>>>> I agree that naming releases stable and alpha is a good idea. I don't
>>>> agree that it will solve the issue, but we can't know until we try.
>>>> 
>>>> Considering that Sqoop2 is intentionally a client-server architecture
>>>> with multiple clients and a REST API as an additional access point, I
>>>> believe that it is not feasible to mark UI as beta.
>>>> 
>>>> I want to stress that I personally believe that Sqoop2 is a very
>>>> viable branch effort, to the extent that I am actively contributing to
>>>> it.
>>>> As Sqoop2 becomes more and more viable alternative to Sqoop1, we need
>>>> to prepare, as a community, to support both versions.
>>>> 
>>>> Considering the number of features currently in Sqoop1 and the number
>>>> of production Sqoop1 users, I can see us supporting both versions for
>>>> the next 2 years. Making it easy for the community to support both is
>>>> my top concern here. Having to resolve endless confusions for two
>>>> years doesn't seem like a happy future to me. I see the Python
>>>> community fighting the same issue since they broke compatibility
>>>> between versions 2 and 3. I'd like to see us learn from those mistakes
>>>> and do better.
>>>> 
>>>> I agree that a discussion would have been better than a vote. I was
>>>> under the mistaken impression that there is a consensus on the matter.
>>>> I renamed the thread to make it clear that we are interested in
>>>> hearing opinions from the rest of the community on this subject.
>>>> 
>>>> 
>>>> Bike-sheddingly yours,
>>>> 
>>>> Gwen Shapira
>>>> 
>>>> 
>>>> 
>>>> On Sat, Jul 26, 2014 at 4:44 PM, Arvind Prabhakar <arvind@apache.org>
>>> wrote:
>>>>> Thanks for the detailed pointers Gwen. I understand your concerns better
>>>>> now. My understanding from these threads as well as what you have
>>> described
>>>>> is that the confusion you refer to stems from the fact that Sqoop2 is
>>> not
>>>>> at feature parity with Sqoop(1) yet.
>>>>> 
>>>>> It will be great to *discuss* what are the various ways to address this
>>> and
>>>>> then call a vote to decide upon the approach to use. Some other
>>> approaches
>>>>> that I can suggest are:
>>>>> 
>>>>> 1. Calling Sqoop1 explicitly as "stable" in our downloads section, or
>>> even
>>>>> within the release label. So instead of Sqoop-1.4.5, it would be
>>>>> Sqoop-1.4.5-stable.
>>>>> 
>>>>> 2. Alternatively calling Sqoop2 explicitly "alpha", "beta" or
>>>>> "experimental". Eg - Sqoop-1.99.4 would become Sqoop-1.99.4-beta.
>>>>> 
>>>>> 3. Or perhaps a combination of both of these.
>>>>> 
>>>>> 4. Plus good UI messaging that clearly outlines the critical differences
>>>>> between these to products.
>>>>> 
>>>>> Personally, I do not believe that having a code name will solve the
>>> issue
>>>>> at hand, and may even make it worse. If the motivation is to call out
>>>>> Sqoop2 as something "not-Sqoop", then perhaps we should discuss the
>>>>> viability of this branch effort. If we conclude that it is not going
to
>>>>> progress any further, we could call a vote on discontinuing this effort
>>> and
>>>>> instead focusing on the main Sqoop1 branch alone.
>>>>> 
>>>>> Hope you understand my point of view on this.
>>>>> 
>>>>> Regards,
>>>>> Arvind Prabhakar
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Fri, Jul 25, 2014 at 10:53 AM, Gwen Shapira <gshapira@cloudera.com>
>>>>> wrote:
>>>>> 
>>>>>> Hi Arvind,
>>>>>> 
>>>>>> Here are few more threads from the last month where we had to explain
>>>>>> Sqoop2 status or explain that you can't use "sqoop import" with
>>>>>> Sqoop2, etc:
>>>>>> 
>>>>>> 
>>>>>> 
>>> http://mail-archives.apache.org/mod_mbox/sqoop-user/201407.mbox/%3CCA%2BP7NPNTFuPYqf74M5OFw4e9xKZm2ns%3DZ0ydkkuJ06Jcg31hnw%40mail.gmail.com%3E
>>>>>> 
>>>>>> 
>>>>>> 
>>> http://mail-archives.apache.org/mod_mbox/sqoop-user/201407.mbox/%3CCAAJ8D%3D9Ho%3DYH7jdavNAb1gwz19Z5dapmS98yR71KmM5LsjCEVw%40mail.gmail.com%3E
>>>>>> 
>>>>>> 
>>>>>> 
>>> http://mail-archives.apache.org/mod_mbox/sqoop-user/201407.mbox/%3CCAPwc21YtdgAm9jO3%2Bs0asBZ2JkG%3DVCp5PLpkO5xQuuBPKQGuTw%40mail.gmail.com%3E
>>>>>> 
>>>>>> 
>>>>>> 
>>> http://mail-archives.apache.org/mod_mbox/sqoop-user/201406.mbox/%3CCAOrS3pxWuxL1X9Sb816N_o1Jd==gs9Ww6UjE2PO+FPaw7VHw1Q@mail.gmail.com%3E
>>>>>> 
>>>>>> In addition, I noticed the problem when talking to users in
>>>>>> conferences, customers, members of support team, etc (not to mention
>>>>>> that I got confused personally when I started out.)
>>>>>> I didn't bring much evidence in my first email because I thought
there
>>>>>> was a wide consensus about the problem.
>>>>>> 
>>>>>> I have several goals with the code-name:
>>>>>> 
>>>>>> * We need to remove the impression that the new version is like Sqoop
>>>>>> only better. It is only somewhat like Sqoop and will not be strictly
>>>>>> better for many months yet.
>>>>>> * We need to clarify that this project is not even close to production
>>>>>> quality.
>>>>>> * We need to make it easy for us to quickly figure out which version
>>>>>> the user is talking about. We also need to make it easy for the users
>>>>>> to describe what they are using.
>>>>>> * We want to have fun :)
>>>>>> 
>>>>>> I think the name Pelican Project will help with all goals:
>>>>>> - It is clearly not the same as Sqoop. So there's no existing
>>>>>> expectations on what will be supported.
>>>>>> - It is a "Project" and not a product yet.
>>>>>> - Sqoop and Pelican don't look or sound similar. No one can expect
to
>>>>>> use Sqoop by running "pelican-shell" or to use Pelican by calling
>>>>>> "sqoop import".
>>>>>> - And a cute mascot will make every future presentation and blog
post
>>>>>> on the topic much more fun.
>>>>>> 
>>>>>> You do bring up good points of concern:
>>>>>> 
>>>>>> a) Existing releases: I disagree code-names for in-progress
>>>>>> development cause too much confusion. They seem fairly common in
the
>>>>>> software world.
>>>>>> 
>>>>>> 
>>> http://royal.pingdom.com/2010/05/27/the-developer-obsession-with-code-names-114-interesting-examples/
>>>>>> 
>>>>>> b) "could impact the reproducibility of previous release builds which
>>>>>> is not very good for the project."
>>>>>> This sounds fairly serious. Can you elaborate what you mean by
>>>>>> reproducibility of release build?
>>>>>> 
>>>>>> Gwen
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Fri, Jul 25, 2014 at 8:02 AM, Arvind Prabhakar <arvind@apache.org>
>>>>>> wrote:
>>>>>>> Hi Gwen,
>>>>>>> 
>>>>>>> Other than the recent thread [1] on our user list, is there any
other
>>>>>>> precedent regarding the confusion this issue has caused? If so,
I
>>> would
>>>>>>> appreciate if you could point it out.
>>>>>>> 
>>>>>>> Personally, I do agree that we ought to have a better mechanism
to
>>>>>>> communicate the completeness (or incompleteness) of a release
in
>>> order to
>>>>>>> ensure the users understand what benefits or drawbacks they may
get.
>>>>>>> Incidentally, this was the primary reason for numbering the Sqoop2
>>>>>> release
>>>>>>> as 1.99.x, thereby indicating that the release is not quite 2.0
yet,
>>>>>> which
>>>>>>> seems to be not working as well as expected.
>>>>>>> 
>>>>>>> One traditional way to alleviate this issue would be to label
the
>>> release
>>>>>>> alpha/beta etc. I prefer doing that instead of putting a code
name for
>>>>>> the
>>>>>>> release for a couple of reasons - a) we have already made releases
of
>>>>>>> Sqoop2 with the previous versioning scheme and hence changing
the name
>>>>>>> could cause more confusion; and b) renaming the branches to the
new
>>> name
>>>>>>> could impact the reproducibility of previous release builds which
is
>>> not
>>>>>>> very good for the project.
>>>>>>> 
>>>>>>> Another alternative to consider would be to have very clear messaging
>>> in
>>>>>>> the user-interface of Sqoop2 that it is still work in progress
and not
>>>>>>> considered at par with Sqoop1.
>>>>>>> 
>>>>>>> [1] http://s.apache.org/TvD
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Arvind Prabhakar
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Jul 25, 2014 at 7:30 AM, Venkat Ranganathan <
>>>>>>> vranganathan@hortonworks.com> wrote:
>>>>>>> 
>>>>>>>> +1 for Pelican.   But documentation should not be called
The Pelican
>>>>>> Brief
>>>>>>>> :)
>>>>>>>> 
>>>>>>>> Venkat
>>>>>>>> 
>>>>>>>> On Thu, Jul 24, 2014 at 8:12 PM, Abraham Elmahrek <abe@cloudera.com>
>>>>>>>> wrote:
>>>>>>>>> There's something about schlep (or schlepper) that I'm
having
>>> trouble
>>>>>>>>> resisting... but... +1 to Pelican.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Thu, Jul 24, 2014 at 7:18 PM, Jarek Jarcec Cecho <
>>>>>> jarcec@apache.org>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> I’m obviously biased, but +1 to Pelican.
>>>>>>>>>> 
>>>>>>>>>> Jarcec
>>>>>>>>>> 
>>>>>>>>>> On Jul 24, 2014, at 7:06 PM, Martin, Nick <NiMartin@pssd.com>
>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> +1 Pelican
>>>>>>>>>>> 
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: Gwen Shapira [mailto:gshapira@cloudera.com]
>>>>>>>>>>> Sent: Thursday, July 24, 2014 9:51 PM
>>>>>>>>>>> To: dev@sqoop.apache.org
>>>>>>>>>>> Subject: Code name for Sqoop 2 (please vote!)
>>>>>>>>>>> 
>>>>>>>>>>> Hi,
>>>>>>>>>>> 
>>>>>>>>>>> As you may have noticed on the user list, Sqoop2
confuses the hell
>>>>>> out
>>>>>>>>>> of everyone.
>>>>>>>>>>> 
>>>>>>>>>>> Part of the problem is the name - Sqoop2 sounds
newer and
>>> therefore
>>>>>>>>>> better. People expect better quality and more features
- which we
>>>>>> don't
>>>>>>>>>> deliver :(
>>>>>>>>>>> 
>>>>>>>>>>> Therefore, I propose finding Sqoop2 a project
code name. This way
>>>>>> it
>>>>>>>>>> will sound experimental and will not have the number
"2" next to
>>> it.
>>>>>>>>>>> We can use the code name to mark the branches
in the repo, the
>>>>>>>>>> documentation, the Hue frontend, etc. This will prevent
confusion
>>> as
>>>>>> the
>>>>>>>>>> name Sqoop will go back to refer to just one project,
and one that
>>>>>>>> actually
>>>>>>>>>> works.
>>>>>>>>>>> 
>>>>>>>>>>> Suggested names:
>>>>>>>>>>> Project Pelican (Based on the animal on O'Reilly's
Sqoop book)
>>>>>> Project
>>>>>>>>>> Schlep (Yiddish for "moving heavy package")
>>>>>>>>>>> 
>>>>>>>>>>> Friends, contributors, committers and PMC members
- please respond
>>>>>>>> with
>>>>>>>>>> either:
>>>>>>>>>>> * Vote (+1) on one of the names above
>>>>>>>>>>> * Your own suggestion
>>>>>>>>>>> 
>>>>>>>>>>> We'll be looking to close the vote by August
1st (Next week).
>>>>>>>>>>> 
>>>>>>>>>>> Gwen
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>>> NOTICE: This message is intended for the use of the individual
or
>>>>>> entity to
>>>>>>>> which it is addressed and may contain information that is
>>> confidential,
>>>>>>>> privileged and exempt from disclosure under applicable law.
If the
>>>>>> reader
>>>>>>>> of this message is not the intended recipient, you are hereby
>>> notified
>>>>>> that
>>>>>>>> any printing, copying, dissemination, distribution, disclosure
or
>>>>>>>> forwarding of this communication is strictly prohibited.
If you have
>>>>>>>> received this communication in error, please contact the
sender
>>>>>> immediately
>>>>>>>> and delete it from your system. Thank You.
>>>>>>>> 
>>>>>> 
>>> 
>>> 


Mime
View raw message