sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Venkat Ranganathan <vranganat...@hortonworks.com>
Subject Re: Discussing solutions to Sqoop1 and Sqoop2 confusion (was: Code name for Sqoop 2)
Date Fri, 01 Aug 2014 18:46:45 GMT
+1 for propsal 1 also



On Fri, Aug 1, 2014 at 9:38 AM, Jarek Jarcec Cecho <jarcec@apache.org> wrote:
> I don’t have any other suggestion either, so let’s discuss which one would people
> I’m personally in favor of proposal 1).
> Jarcec
> On Jul 28, 2014, at 10:04 AM, Gwen Shapira <gshapira@cloudera.com> wrote:
>> Thanks for the great summary. I don't have additional suggestions.
>> Gwen
>> On Sun, Jul 27, 2014 at 11:03 AM, Arvind Prabhakar <arvind@apache.org> wrote:
>>> Thanks Gwen and Jarcec. It appears that we all agree to the few basic
>>> points below:
>>> a) Sqoop2 is promising effort although not near completion. We agree that
>>> there is no need to discuss shutting that down at this time.
>>> b) The naming of Sqoop2 is such that it raises expectations in
>>> users/adopters to be better than Sqoop(1) and thus leads to confusion.
>>> The second point (b) above is the key issue that needs resolution. The
>>> options discussed thus far are as follows:
>>> 1. Put a code name for Sqoop2 so that it is not confused with Sqoop(1).
>>> This seems to have good community support.
>>> 2. Use a explicit labels such as "stable" vs "beta/alpha/experimental" for
>>> various Sqoop releases.
>>> 3. Use explicit UI messaging to warn Sqoop2 users that it is not the same
>>> as Sqoop(1) and is far behind on feature completeness and stability. There
>>> seems to be some concerns around how this can be done given the
>>> client/server architecture of Sqoop2.
>>> 4. A combination of options 2 and 3 above.
>>> Are there any suggestions to mitigate this problem? Perhaps we should
>>> cross-post this thread to user list as well to see if they agree with the
>>> options here and/or if they have any other suggestions.
>>> Regards,
>>> Arvind Prabhakar
>>> On Sat, Jul 26, 2014 at 6:50 PM, Jarek Jarcec Cecho <jarcec@apache.org>
>>> wrote:
>>>> Hi Arvind,
>>>> thank you very much for sharing your concerns! You’ve risen a very good
>>>> points.
>>>> I personally see value in Sqoop 2 as the new architecture will allow us to
>>>> cover much more use cases, various compliancy regulations and will
>>>> eventually simplify user’s life. Based on the recent increase in dev
>>>> activity, it seems that I’m not the only one who do believes in that and
>>>> hence I strongly believe that discontinuing the effort doesn’t seem as
>>>> right way to go. I’m more then happy to discuss this topic further if you
>>>> believe that it’s the right way though.
>>>> Having said that I do believe in Sqoop 2, I have to second Gwen that
>>>> current situation is very confusing to our users. I’ve seen significant
>>>> number of users confused about why 1.99.4 do not have Avro/HBase/Hive
>>>> integration when Sqoop 1 already have that. I was anticipating the
>>>> confusion and hence I’ve suggested to use version number 1.99.x instead
>>>> 2.0.0 back when we were working on getting the first cut out of the door.
>>>> hoped that version 1.99.x will make obvious to everybody that it’s not
>>>> “2.0.0” quite yet. Sadly it seems that this alone did not helped as much
>>>> I hoped.
>>>> Hence I do see value in changing our public messaging as you’ve suggested
>>>> to make the message more clearer. I personally like the idea with code name
>>>> as that is quite popular in other projects and companies (remember Windows
>>>> Longorn?) and it seems to be conveying the message. I do see a lot of
>>>> variability of using that code name though - I don’t think that we
>>>> necessarily have to remove any possible reference to “Sqoop 2” from the
>>>> face of earth. I believe that the code name is very well suited for our
>>>> webpage, wiki and perhaps a blog posts to make obvious that there is just
>>>> one single stable Sqoop version and then some ongoing effort that do have
>>>> available several cuts. I believe that just by doing that we will decrease
>>>> confusion about what version should user download and use. We can discuss
>>>> to what extent we want to push the code name and to what extent we will
>>>> keep the reference to “Sqoop 2”. After all I’m confident that in not
>>>> distant future, we will have Sqoop 2  that will offer the comparable
>>>> capability and features as Sqoop 1.
>>>> I don’t think that the code name is the only idea that will decrease the
>>>> immediate user confusion and hence I’m happy to hear others thoughts!
>>>> Jarcec
>>>> On Jul 26, 2014, at 6:00 PM, Gwen Shapira <gshapira@cloudera.com> wrote:
>>>>> Thanks Arvind for your detailed explanation.
>>>>> I agree that naming releases stable and alpha is a good idea. I don't
>>>>> agree that it will solve the issue, but we can't know until we try.
>>>>> Considering that Sqoop2 is intentionally a client-server architecture
>>>>> with multiple clients and a REST API as an additional access point, I
>>>>> believe that it is not feasible to mark UI as beta.
>>>>> I want to stress that I personally believe that Sqoop2 is a very
>>>>> viable branch effort, to the extent that I am actively contributing to
>>>>> it.
>>>>> As Sqoop2 becomes more and more viable alternative to Sqoop1, we need
>>>>> to prepare, as a community, to support both versions.
>>>>> Considering the number of features currently in Sqoop1 and the number
>>>>> of production Sqoop1 users, I can see us supporting both versions for
>>>>> the next 2 years. Making it easy for the community to support both is
>>>>> my top concern here. Having to resolve endless confusions for two
>>>>> years doesn't seem like a happy future to me. I see the Python
>>>>> community fighting the same issue since they broke compatibility
>>>>> between versions 2 and 3. I'd like to see us learn from those mistakes
>>>>> and do better.
>>>>> I agree that a discussion would have been better than a vote. I was
>>>>> under the mistaken impression that there is a consensus on the matter.
>>>>> I renamed the thread to make it clear that we are interested in
>>>>> hearing opinions from the rest of the community on this subject.
>>>>> Bike-sheddingly yours,
>>>>> Gwen Shapira
>>>>> On Sat, Jul 26, 2014 at 4:44 PM, Arvind Prabhakar <arvind@apache.org>
>>>> wrote:
>>>>>> Thanks for the detailed pointers Gwen. I understand your concerns
>>>>>> now. My understanding from these threads as well as what you have
>>>> described
>>>>>> is that the confusion you refer to stems from the fact that Sqoop2
>>>> not
>>>>>> at feature parity with Sqoop(1) yet.
>>>>>> It will be great to *discuss* what are the various ways to address
>>>> and
>>>>>> then call a vote to decide upon the approach to use. Some other
>>>> approaches
>>>>>> that I can suggest are:
>>>>>> 1. Calling Sqoop1 explicitly as "stable" in our downloads section,
>>>> even
>>>>>> within the release label. So instead of Sqoop-1.4.5, it would be
>>>>>> Sqoop-1.4.5-stable.
>>>>>> 2. Alternatively calling Sqoop2 explicitly "alpha", "beta" or
>>>>>> "experimental". Eg - Sqoop-1.99.4 would become Sqoop-1.99.4-beta.
>>>>>> 3. Or perhaps a combination of both of these.
>>>>>> 4. Plus good UI messaging that clearly outlines the critical differences
>>>>>> between these to products.
>>>>>> Personally, I do not believe that having a code name will solve the
>>>> issue
>>>>>> at hand, and may even make it worse. If the motivation is to call
>>>>>> Sqoop2 as something "not-Sqoop", then perhaps we should discuss the
>>>>>> viability of this branch effort. If we conclude that it is not going
>>>>>> progress any further, we could call a vote on discontinuing this
>>>> and
>>>>>> instead focusing on the main Sqoop1 branch alone.
>>>>>> Hope you understand my point of view on this.
>>>>>> Regards,
>>>>>> Arvind Prabhakar
>>>>>> On Fri, Jul 25, 2014 at 10:53 AM, Gwen Shapira <gshapira@cloudera.com>
>>>>>> wrote:
>>>>>>> Hi Arvind,
>>>>>>> Here are few more threads from the last month where we had to
>>>>>>> Sqoop2 status or explain that you can't use "sqoop import" with
>>>>>>> Sqoop2, etc:
>>>> http://mail-archives.apache.org/mod_mbox/sqoop-user/201407.mbox/%3CCA%2BP7NPNTFuPYqf74M5OFw4e9xKZm2ns%3DZ0ydkkuJ06Jcg31hnw%40mail.gmail.com%3E
>>>> http://mail-archives.apache.org/mod_mbox/sqoop-user/201407.mbox/%3CCAAJ8D%3D9Ho%3DYH7jdavNAb1gwz19Z5dapmS98yR71KmM5LsjCEVw%40mail.gmail.com%3E
>>>> http://mail-archives.apache.org/mod_mbox/sqoop-user/201407.mbox/%3CCAPwc21YtdgAm9jO3%2Bs0asBZ2JkG%3DVCp5PLpkO5xQuuBPKQGuTw%40mail.gmail.com%3E
>>>> http://mail-archives.apache.org/mod_mbox/sqoop-user/201406.mbox/%3CCAOrS3pxWuxL1X9Sb816N_o1Jd==gs9Ww6UjE2PO+FPaw7VHw1Q@mail.gmail.com%3E
>>>>>>> In addition, I noticed the problem when talking to users in
>>>>>>> conferences, customers, members of support team, etc (not to
>>>>>>> that I got confused personally when I started out.)
>>>>>>> I didn't bring much evidence in my first email because I thought
>>>>>>> was a wide consensus about the problem.
>>>>>>> I have several goals with the code-name:
>>>>>>> * We need to remove the impression that the new version is like
>>>>>>> only better. It is only somewhat like Sqoop and will not be strictly
>>>>>>> better for many months yet.
>>>>>>> * We need to clarify that this project is not even close to production
>>>>>>> quality.
>>>>>>> * We need to make it easy for us to quickly figure out which
>>>>>>> the user is talking about. We also need to make it easy for the
>>>>>>> to describe what they are using.
>>>>>>> * We want to have fun :)
>>>>>>> I think the name Pelican Project will help with all goals:
>>>>>>> - It is clearly not the same as Sqoop. So there's no existing
>>>>>>> expectations on what will be supported.
>>>>>>> - It is a "Project" and not a product yet.
>>>>>>> - Sqoop and Pelican don't look or sound similar. No one can expect
>>>>>>> use Sqoop by running "pelican-shell" or to use Pelican by calling
>>>>>>> "sqoop import".
>>>>>>> - And a cute mascot will make every future presentation and blog
>>>>>>> on the topic much more fun.
>>>>>>> You do bring up good points of concern:
>>>>>>> a) Existing releases: I disagree code-names for in-progress
>>>>>>> development cause too much confusion. They seem fairly common
in the
>>>>>>> software world.
>>>> http://royal.pingdom.com/2010/05/27/the-developer-obsession-with-code-names-114-interesting-examples/
>>>>>>> b) "could impact the reproducibility of previous release builds
>>>>>>> is not very good for the project."
>>>>>>> This sounds fairly serious. Can you elaborate what you mean by
>>>>>>> reproducibility of release build?
>>>>>>> Gwen
>>>>>>> On Fri, Jul 25, 2014 at 8:02 AM, Arvind Prabhakar <arvind@apache.org>
>>>>>>> wrote:
>>>>>>>> Hi Gwen,
>>>>>>>> Other than the recent thread [1] on our user list, is there
any other
>>>>>>>> precedent regarding the confusion this issue has caused?
If so, I
>>>> would
>>>>>>>> appreciate if you could point it out.
>>>>>>>> Personally, I do agree that we ought to have a better mechanism
>>>>>>>> communicate the completeness (or incompleteness) of a release
>>>> order to
>>>>>>>> ensure the users understand what benefits or drawbacks they
may get.
>>>>>>>> Incidentally, this was the primary reason for numbering the
>>>>>>> release
>>>>>>>> as 1.99.x, thereby indicating that the release is not quite
2.0 yet,
>>>>>>> which
>>>>>>>> seems to be not working as well as expected.
>>>>>>>> One traditional way to alleviate this issue would be to label
>>>> release
>>>>>>>> alpha/beta etc. I prefer doing that instead of putting a
code name for
>>>>>>> the
>>>>>>>> release for a couple of reasons - a) we have already made
releases of
>>>>>>>> Sqoop2 with the previous versioning scheme and hence changing
the name
>>>>>>>> could cause more confusion; and b) renaming the branches
to the new
>>>> name
>>>>>>>> could impact the reproducibility of previous release builds
which is
>>>> not
>>>>>>>> very good for the project.
>>>>>>>> Another alternative to consider would be to have very clear
>>>> in
>>>>>>>> the user-interface of Sqoop2 that it is still work in progress
and not
>>>>>>>> considered at par with Sqoop1.
>>>>>>>> [1] http://s.apache.org/TvD
>>>>>>>> Regards,
>>>>>>>> Arvind Prabhakar
>>>>>>>> On Fri, Jul 25, 2014 at 7:30 AM, Venkat Ranganathan <
>>>>>>>> vranganathan@hortonworks.com> wrote:
>>>>>>>>> +1 for Pelican.   But documentation should not be called
The Pelican
>>>>>>> Brief
>>>>>>>>> :)
>>>>>>>>> Venkat
>>>>>>>>> On Thu, Jul 24, 2014 at 8:12 PM, Abraham Elmahrek <abe@cloudera.com>
>>>>>>>>> wrote:
>>>>>>>>>> There's something about schlep (or schlepper) that
I'm having
>>>> trouble
>>>>>>>>>> resisting... but... +1 to Pelican.
>>>>>>>>>> On Thu, Jul 24, 2014 at 7:18 PM, Jarek Jarcec Cecho
>>>>>>> jarcec@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>> I’m obviously biased, but +1 to Pelican.
>>>>>>>>>>> Jarcec
>>>>>>>>>>> On Jul 24, 2014, at 7:06 PM, Martin, Nick <NiMartin@pssd.com>
>>>> wrote:
>>>>>>>>>>>> +1 Pelican
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: Gwen Shapira [mailto:gshapira@cloudera.com]
>>>>>>>>>>>> Sent: Thursday, July 24, 2014 9:51 PM
>>>>>>>>>>>> To: dev@sqoop.apache.org
>>>>>>>>>>>> Subject: Code name for Sqoop 2 (please vote!)
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> As you may have noticed on the user list,
Sqoop2 confuses the hell
>>>>>>> out
>>>>>>>>>>> of everyone.
>>>>>>>>>>>> Part of the problem is the name - Sqoop2
sounds newer and
>>>> therefore
>>>>>>>>>>> better. People expect better quality and more
features - which we
>>>>>>> don't
>>>>>>>>>>> deliver :(
>>>>>>>>>>>> Therefore, I propose finding Sqoop2 a project
code name. This way
>>>>>>> it
>>>>>>>>>>> will sound experimental and will not have the
number "2" next to
>>>> it.
>>>>>>>>>>>> We can use the code name to mark the branches
in the repo, the
>>>>>>>>>>> documentation, the Hue frontend, etc. This will
prevent confusion
>>>> as
>>>>>>> the
>>>>>>>>>>> name Sqoop will go back to refer to just one
project, and one that
>>>>>>>>> actually
>>>>>>>>>>> works.
>>>>>>>>>>>> Suggested names:
>>>>>>>>>>>> Project Pelican (Based on the animal on O'Reilly's
Sqoop book)
>>>>>>> Project
>>>>>>>>>>> Schlep (Yiddish for "moving heavy package")
>>>>>>>>>>>> Friends, contributors, committers and PMC
members - please respond
>>>>>>>>> with
>>>>>>>>>>> either:
>>>>>>>>>>>> * Vote (+1) on one of the names above
>>>>>>>>>>>> * Your own suggestion
>>>>>>>>>>>> We'll be looking to close the vote by August
1st (Next week).
>>>>>>>>>>>> Gwen
>>>>>>>>> --
>>>>>>>>> NOTICE: This message is intended for the use of the individual
>>>>>>> entity to
>>>>>>>>> which it is addressed and may contain information that
>>>> confidential,
>>>>>>>>> privileged and exempt from disclosure under applicable
law. If the
>>>>>>> reader
>>>>>>>>> of this message is not the intended recipient, you are
>>>> notified
>>>>>>> that
>>>>>>>>> any printing, copying, dissemination, distribution, disclosure
>>>>>>>>> forwarding of this communication is strictly prohibited.
If you have
>>>>>>>>> received this communication in error, please contact
the sender
>>>>>>> immediately
>>>>>>>>> and delete it from your system. Thank You.

NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

View raw message