hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ruilong Huo <r...@pivotal.io>
Subject Re: libhdfs3 development is still going on outside of ASF
Date Fri, 16 Sep 2016 01:53:40 GMT
Hi All,

It is great to have comment and concern from different stakeholder discussed here.

@Zhanwei, from the maintenance perspective, is it possible that we keep a version of libhdfs3
that is compatible among all users of it? This would help to reduce the customization for
it and make it more generally adopted by broader audience in community.

Secondly, if the single compatible version is "always" stable (in most case it should be),
we then don't bother to that maintenance effort to pick up the stable version of it for HAWQ
as both HAWQ and libhdfs3 grow.

Any comment from you would be appreciated:)

@Roman, regarding the governance of ASF, from your perspective, is that means we do need to
keep them in one ASF project and repo. Or it is ok to make it a separate ASF repo other than
Pivotal Data Fabric? Your advise are also appreciated. Thanks.

> 在 2016年9月15日,20:16,Matthew Rocklin <mrocklin@continuum.io> 写道:
> 
> Hi All,
> 
> I joined this e-mail list in order to chime in to this discussion.  I'm not
> part of Apache HAWQ but *do* use libhdfs3 and know a number of other people
> who do as well.
> 
> I maintain a library for parallel programming Dask
> <http://dask.pydata.org/en/latest/>, which is commonly used within the
> PyData software ecosystem.  We often interact with data on HDFS and found
> libhdfs3 to be an excellent solution, particularly because it doesn't
> require JVM interaction, which is rare among our users.   To assist Python
> users we made the wrapper library hdfs3
> <http://hdfs3.readthedocs.io/en/latest/>, which has gotten some traction
> both within Dask and outside.
> 
> We intentionally released and maintain hdfs3 separately from Dask because
> it's a more general and releasable component.  This turns out to have been
> a good move.  There are lots of people who use hdfs3 who have no interest
> in using Dask at all.  They appreciate this separation because they're not
> forced to grab all of Dask in order to just get the single component they
> want, hdfs3.  These are great users.  They come from a wide range of
> university to small and large businesses.  They contribute back to hdfs3
> readily and are also, today, trying to contribute back to libhdfs3.  By not
> tying hdfs3 into Dask we increased both community engagement and social
> impact.
> 
> So my initial bias is "Please, keep libhdfs3 separate.  It will make my
> life (and the lives of many others) much more convenient."  However I also
> recognize the need for Apache's strict-for-a-reason policies.  No matter
> what you all decide the PyData community will find a way to make things
> work.  I just wanted to make it clear that there are several other
> stakeholders out there using this library so that this decision wasn't made
> in a vacuum.
> 
> Best,
> -matthew rocklin
> 
> 
> 
> 
>> On Thu, Sep 15, 2016 at 2:38 AM, Zhanwei Wang <wangzw@apache.org> wrote:
>> 
>> Hi Roman
>> 
>> I think I have discussed enough about the benefit and drawback of merge
>> two independent project together.
>> Let me propose a way to see if it can make both ASF and libhdfs3’s user
>> happy. And I need your advise.
>> 
>> 
>> Is it possibile to have two git repository in ASF for HAWQ incubator
>> project. If it is possible, I propose to solve the libhdfs3 issue like this.
>> 
>> 1) create a new git repository in ASF and push all libhdfs3’s code and
>> branch from Github to ASF.
>> 2) make libhdfs3’s Github repository as read only mirror of ASF
>> repository. Maybe need to transfer current owner of Github repository from
>> Pivotal to ASF on Github.
>> 3) HAWQ keep the stable version code of libhdfs3 or just Git reference.
>> 
>> 
>> In this way, we keep libhdfs3 independent and keep its all pull request,
>> wiki, issues and history. And most importantly libhdfs3 can follow ASF
>> rules and process. People can file pull request on Github and commit to ASF
>> repository and eventually mirror to Github.
>> 
>> 
>> Any comments?
>> 
>> 
>> Best Regards
>> 
>> Zhanwei Wang
>> wangzw@apache.org
>> 
>> 
>> 
>>>> 在 2016年9月15日,下午2:19,Zhanwei Wang <wangzw@apache.org>
写道:
>>>> 
>>>> Open source is about community first.
>>> 
>>> Good point Kyle. I strongly agree with you!
>>> 
>>> But unfortunately seems no one in this thread care about libhdfs3’s
>> community (users) except me. Positively ignore the frustration of libhdfs3
>> users and about to delete it’s repository.
>>> 
>>> 
>>> So let’s set the tone of this thread.
>>> 
>>> If we remove libhdfs3’s repository or make it read only:
>>> a. What benefit we can get for BOTH HAWQ and libhdfs3’s users?
>>> b. What drawback for BOTH HAWQ and libhdfs3’s users?
>>> 
>>> 
>>> 
>>> The following is my answer.
>>> 
>>> a. Benefit: For HAWQ, seems ASF govern its property with ASF rules.  For
>> libhdfs3’s users, none.
>>> 
>>> b. Drawback: For HAWQ, not relevant commits will come into HAWQ’s commit
>> log. JIRA and pull request will be fired in HAWQ but not related to HAWQ.
>> Furthermore commit in libhdfs3 may break HAWQ and it’s hard to debug, I
>> have experienced it enough. It is important to use the stable version of
>> libhdfs3, HAWQ code should only keep the stable version of libhdfs3.
>>> 
>>>   For libhdfs3’s user, they have to ask question in HAWQ’s community.
>> They have to clone entire HAWQ to build libhdfs3 and contribute.
>>> 
>>> Let’s think about more. How we schedule a release of libhdfs3 when HAWQ
>> is under developing? Should we branch HAWQ for libhdfs3’s release? Should
>> we merge libhdfs3’s pull request when we are releasing HAWQ? Do we have to
>> sync the release process of HAWQ and libhdfs3 and how?
>>> 
>>> Maybe we should better involve libhdfs3’s users into this thread. But
>> unfortunately they are not in HAWQ’s mail list. See, this is another big
>> issue. We discuss dropping libhdfs3’s repository in HAWQ’s mail list
>> without libhdfs3’s users involved, seems odd. Image this, one day the
>> repository you are working with is gone and you even do not know this
>> discuss.
>>> 
>>> If anyone want to discuss if we should dropping libhdfs3’s repository,
>> the better place is libhdfs3’s repository.
>>> 
>>> In general merge two independent project together introduce more trouble
>> than benefit.
>>> 
>>> To be clear, I’m not against ASF rule. I’m deeply understand the
>> importance of it. Is there any way to make HAWQ and libhdfs3 separated and
>> make both ASF and libhdfs3’s user happy? Just like Kyle said, “HOW” is more
>> important.
>>> 
>>> @Roman, your mentoring is important.
>>> 
>>> 
>>> Any comments?
>>> 
>>> 
>>> Best Regards
>>> 
>>> Zhanwei Wang
>>> wangzw@apache.org
>>> 
>>> 
>>> 
>>>> 在 2016年9月15日,下午12:54,Kyle Dunn <kdunn@pivotal.io> 写道:
>>>> 
>>>> Chiming in here only as a casual but concerned observer.
>>>> 
>>>> Open source is about community first. If the logistics around "where"
>>>> libhdfs3 lives rather than the much more important issue of "how" it
>> lives
>>>> are the focus here, I think we've missed the real issue.
>>>> 
>>>> For what it's worth, I concur with others, let's move it to HAWQ
>>>> exclusively and move on to addressing the community, starting with the
>>>> decision being made and how/where future contributions can be made.
>>>> 
>>>> My brief scan of libhdfs3 shows numerous open pull requests (with
>>>> apparently useful contributions) and several loose ends "issues". We
>> need
>>>> to communicate effectively to these contributors whether those PRs and
>>>> issues are valuable and relevant. This type of engagement is what OSS
>>>> projects live and die by. We need to be better, starting with libhdfs3,
>>>> into HAWQ, and beyond.
>>>> 
>>>> "Open source isn't someone else's job" - it's everyone's job. I'm
>>>> challenging everyone with commit responsibly on repos to value community
>>>> input (both code and issues) as highly as your own backlog. Pay it
>> forward
>>>> and maybe the community will start shrinking your backlog unexpectedly.
>>>> 
>>>> 
>>>> -Kyle
>>>> 
>>>>> On Wed, Sep 14, 2016, 21:33 Lei Chang <chang.lei.cn@gmail.com>
wrote:
>>>>> 
>>>>> 
>>>>> There was a short discussion before when we moved libhfds3 to HAWQ
>> repo.
>>>>> 
>>>>> http://mail-archives.apache.org/mod_mbox/incubator-hawq-
>> dev/201602.mbox/%3cCAE44UQe1xgcVOC76T_mgVbgGbR=
>> Lx=XUBPVw18ZK4iZ3euCH+g@mail.gmail.com%3e
>>>>> I think it makes sense to keep libhdfs3 only in HAWQ repo to simplify
>>>>> Apache build and releases in current phase. This is what we have done
>> in
>>>>> the past. But looks not everyone is on the same page.
>>>>> CheersLei
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Thu, Sep 15, 2016 at 11:12 AM +0800, "Greg Chase" <
>> greg@gregchase.com>
>>>>> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> Its fine if libhdfs3 is a third party license, and is treated that way.
>>>>> 
>>>>> However, why does Apache HAWQ want to be dependent on some strange 3rd
>>>>> party library with no transparency?
>>>>> 
>>>>> We are having enough difficulties just getting our first release out.
>>>>> 
>>>>> Is there a compelling reason why we need to keep up with the
>> independently
>>>>> developed libhdfs3 project?  Are they willing to make necessary
>> changes so
>>>>> that they are compatible with ASF's strict-for-a-good-reason policies?
>>>>> 
>>>>> Can we fork hdfs3 for Apache HAWQ's purposes in Apache?
>>>>> 
>>>>> If any libhdfs3 committers are also part of Apache HAWQ, perhaps you
>> can
>>>>> shed some light on the viability of this as an independent project
>> since I
>>>>> only see 4 contributors.
>>>>> 
>>>>> -Greg
>>>>> 
>>>>>> On Wed, Sep 14, 2016 at 7:54 PM, Hong Wu  wrote:
>>>>>> 
>>>>>> In my opinion, I think it is reasonable to transfer the third-party
>> repo
>>>>> of
>>>>>> libhdfs3 totally into HAWQ, not only for the convenience of HAWQ
>> build,
>>>>> but
>>>>>> also for the consideration of ASF project. So for HAWQ project, I
am
>> with
>>>>>> Roman.
>>>>>> 
>>>>>> But my concern is the current users of libhdfs3 and all the pull
>>>>> requests,
>>>>>> wiki docs and issues. Another uncertain aspect from my perspective
is
>>>>> that
>>>>>> although HAWQ could not run without libhdfs3, libhdfs3 could be used
>> in
>>>>>> other open source projects, that might be the true meaning of making
>>>>>> libhdfs3 open source at the beginning.
>>>>>> 
>>>>>> In summary, if it is really against the spirit of a ASF project for
>>>>> HAWQ, a
>>>>>> suggested way might be marking original libhdfs3 repo as a legacy
>> repo in
>>>>>> stead of remove it.
>>>>>> 
>>>>>> Best
>>>>>> Hong
>>>>>> 
>>>>>> 2016-09-15 10:04 GMT+08:00 Zhanwei Wang :
>>>>>> 
>>>>>>> Currently libhdfs3’s official code is not the same as in HAWQ.
Some
>> new
>>>>>>> code does not copy into HAWQ.  I do not think code change of
libhdfs3
>>>>>>> should follow HAWQ’s commit process because  many change are
not
>>>>> related
>>>>>> to
>>>>>>> HAWQ.
>>>>>>> 
>>>>>>> From HAWQ side, I suggest to keep the stable version of its
>> third-party
>>>>>>> libraries and copy new libhdfs3’s code only when it is necessary.
>>>>>>> 
>>>>>>> libhdfs3 was open source years before HAWQ incubating with a
>> separated
>>>>>>> permission of its authority. So in my opinion it is a third party
and
>>>>> it
>>>>>>> actually was a third party before HAWQ incubating. And HAWQ is
not
>> the
>>>>>> only
>>>>>>> user.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Best Regards
>>>>>>> 
>>>>>>> Zhanwei Wang
>>>>>>> wangzw@apache.org
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> 在 2016年9月15日,上午9:35,Roman Shaposhnik  写道:
>>>>>>>> 
>>>>>>>> On Wed, Sep 14, 2016 at 6:29 PM, Zhanwei Wang
>>>>>> wrote:
>>>>>>>>> Hi Roman
>>>>>>>>> 
>>>>>>>>> libhdfs3 works as third-party library of HAWQ, Just for
the
>>>>>> convenience
>>>>>>> of HAWQ release
>>>>>>>>> process we copy its code into HAWQ.  The reason is that
HAWQ used
>> to
>>>>>>> dependent on
>>>>>>>>> specific version of libhdfs3 and libhdfs3 only distribute
as source
>>>>>>> code and the build process is complicated.
>>>>>>>> 
>>>>>>>> I actually don't buy this argument. libhdfs3 is not an optional
>>>>>>>> dependency for HAWQ
>>>>>>>> like ORCA is (for example). Without libhdfs3 there's pretty
tough to
>>>>>>>> imagine HAWQ.
>>>>>>>> As such the code base needs to be governed as part of the
ASF
>>>>> project,
>>>>>>>> not a random
>>>>>>>> GitHub dependency.
>>>>>>>> 
>>>>>>>> IOW, let me ask you this: were all the changes that went
into
>>>>> libhdfs3
>>>>>>>> that is part of
>>>>>>>> HAWQ discussed and reviewed via the ASF development process
or did
>>>>> you
>>>>>>> just
>>>>>>>> import them from time to time as this comment suggests:
>>>>>>>> https://issues.apache.org/jira/browse/HAWQ-1046?
>>>>>>> focusedCommentId=15489669&page=com.atlassian.jira.
>>>>>>> plugin.system.issuetabpanels:comment-tabpanel#comment-15489669
>>>>>>>> ?
>>>>>>>> 
>>>>>>>>> I do not think we have any reason to shutdown a third
party’s
>>>>> official
>>>>>>> repository.
>>>>>>>> 
>>>>>>>> You say 3d party as though its not just you guys maintaining
it on
>>>>> the
>>>>>>> side.
>>>>>>>> 
>>>>>>>>> We also copy google test source code into HAWQ, just
as what we did
>>>>>> for
>>>>>>> libhdfs3.
>>>>>>>> 
>>>>>>>> But this is very different. You don't do any development
(certainly
>>>>>>>> you don't do any
>>>>>>>> non-trivial development) of that code.
>>>>>>>> 
>>>>>>>>> libhdfs3 open source under Apache license version 2 just
the same
>> as
>>>>>>> HAWQ. So I believe there is no license issue.
>>>>>>>> 
>>>>>>>> You're correct. There's no licensing issue but there's a
pretty
>>>>>>> significant
>>>>>>>> governance issue.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Roman.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>> *Kyle Dunn | Data Engineering | Pivotal*
>>>> Direct: 303.905.3171 <3039053171> | Email: kdunn@pivotal.io
>> 
>> 

Mime
View raw message