asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Carey <dtab...@gmail.com>
Subject Re: Adapting TimSort into AsterixDB/Hyracks
Date Sat, 28 Oct 2017 23:49:05 GMT
Josh Bloch is a long-time software rock star who originated as a PhD 
alumnus of the transactions group at CMU back when they were really 
strong in that area.  (Not that that helps answer any of the legal 
questions. :-))


On 10/28/17 4:30 PM, Till Westmann wrote:
> I largely agree with Xikui.
>
> This one [2] seems to clearly state in its header that it’s published
> by Oracle under the GPL (which is category X [3] and cannot be included
> in Apache products).
> It would be interesting to understand the provenance of the Spark
> version to understand if/why it is not derived from the Oracle
> implementation. If they are essentially all the same implementation,
> I’d like to understand who the original copyright owner is (Josh
> Bloch?) and when the decision was taken to license the same
> implementation under both Apache and GPL licenses before we add it to
> the Hyracks code base.
> Alternatively, if the Android/Spark implementation is independent of
> the Oracle implementation, we should be able to base our implementaion
> on that.
>
> Cheers,
> Till
>
> [2] 
> https://github.com/retrostreams/android-retrostreams/blob/master/src/main/java/java9/util/TimSort.java
> [3] https://www.apache.org/legal/resolved.html#category-x
>
> On 28 Oct 2017, at 10:54, Chen Luo wrote:
>
>> Thanks guys! I think the one I'm using is identical to the one 
>> pointed by
>> Wail 
>> (https://github.com/retrostreams/android-retrostreams/blob/master/src/
>> main/java/java9/util/TimSort.java), and also identical to the one 
>> used by
>> Spark. All these three implementations are credit to the same person 
>> Josh
>> Bloch.
>>
>> Then we're good!
>>
>> Best regards,
>> Chen Luo
>>
>> On Sat, Oct 28, 2017 at 9:52 AM, Xikui Wang <xikuiw@uci.edu> wrote:
>>
>>> I think probably it's not okay to use JDK's implementation due to the
>>> license issue [1]. (Search for BCL)
>>>
>>> Alternatively, you could use the one that Spark used from the Android
>>> Opensource Project, as Wail pointed out. That one is under Apache 
>>> License.
>>>
>>> [1] https://www.apache.org/legal/resolved.html
>>>
>>> On Sat, Oct 28, 2017 at 9:44 AM, Wail Alkowaileet <wael.y.k@gmail.com>
>>> wrote:
>>>
>>>> P.S Spark implementation is under Apache.
>>>>
>>>> On Sat, Oct 28, 2017 at 9:41 AM, Wail Alkowaileet <wael.y.k@gmail.com>
>>>> wrote:
>>>>
>>>>> Android has an implementation:
>>>>> https://github.com/retrostreams/android-retrostreams/blob/master/src/
>>>>> main/java/java9/util/TimSort.java
>>>>>
>>>>> Spark has ported it
>>>>> https://github.com/apache/spark/blob/master/core/src/
>>>>> main/java/org/apache/spark/util/collection/TimSort.java
>>>>>
>>>>> We can customize it for AsterixDB comparators.
>>>>>
>>>>> On Sat, Oct 28, 2017 at 9:28 AM, Chen Luo <cluo8@uci.edu> wrote:
>>>>>
>>>>>> I don't know whether there is an easy way for us to directly reuse
>>>> TimSort
>>>>>> in JDK since it's design for sorting objects in an Array, while in
>>>> Hyracks
>>>>>> we don't have explicit object creation and sort everything in main
>>>> memory.
>>>>>> So what I did is that I copied it's source code, and replaced all
>>> object
>>>>>> assignments/swaps/comparisons using Hyracks in-memory operations.
>>>>>>
>>>>>> Best regards,
>>>>>> Chen Luo
>>>>>>
>>>>>> On Sat, Oct 28, 2017 at 12:07 AM, 李文海 <00008749@whu.edu.cn>

>>>>>> wrote:
>>>>>>
>>>>>>> I believe reusing jdk afap could be better. btw, timsort is better
>>>> than
>>>>>>> others by 1x when records are locally ordered .
>>>>>>> best
>>>>>>>
>>>>>>> 在 2017-10-28 14:38:21,"abdullah alamoudi" <bamousaa@gmail.com>

>>>>>>> 写道:
>>>>>>>
>>>>>>>> While I have no answer to the question of legality, this
sounds
>>>> great.
>>>>>>>>
>>>>>>>> ~Abdullah.
>>>>>>>>
>>>>>>>>> On Oct 27, 2017, at 9:20 PM, Chen Luo <cluo8@uci.edu>
wrote:
>>>>>>>>>
>>>>>>>>> Hi devs,
>>>>>>>>>
>>>>>>>>> I have adapted the TimSort algorithm used in JDK
>>>> (java.util.TimSort)
>>>>>>> into
>>>>>>>>> Hyracks, which gives 10-20% performance improvements
on random
>>>> data.
>>>>>> It
>>>>>>>>> will be more useful if the input data is partially sorted,
e.g.,
>>>>>> primary
>>>>>>>>> keys fetched from secondary index scan, which I haven't
got time
>>> to
>>>>>>>>> experiment with.
>>>>>>>>>
>>>>>>>>> *Before going any further, is it legal to adapt some
algorithm
>>>>>>>>> implementation from JDK into our codebase? *I saw the
JDK
>>>>>> implementation
>>>>>>>>> itself is adopted from
>>>>>>>>> http://svn.python.org/projects/python/trunk/Objects/listsort.txt
>>>> as
>>>>>>> well.
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Chen Luo
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>>
>>>>> *Regards,*
>>>>> Wail Alkowaileet
>>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>>
>>>> *Regards,*
>>>> Wail Alkowaileet
>>>>
>>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message