commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Benson <gudnabr...@gmail.com>
Subject Re: Commons sorting?
Date Fri, 03 May 2013 14:30:45 GMT
Without putting too much importance on what I, personally, think, know that
any ASF committer can have access to hack on a Commons-style library in our
sandbox, just by asking for it.  From my personal POV, I'm not sure if what
we're talking about sounds like it has enough "surface area" to warrant a
separate component, but getting everything in place would allow the
community to make the decision of whether a separate component is in order,
or if perhaps this code might live comfortably in e.g. Commons [lang].

Regards,
Matt


On Fri, May 3, 2013 at 8:26 AM, Adrien Grand <jpountz@gmail.com> wrote:

> Hi,
>
> I'm working on an abstraction to sort random-access data-structures
> for Lucene[1] and would like to know if you think this is interesting
> enough for inclusion in Apache Commons.
>
> The reason for such an abstraction is that we often need to sort
> parallel arrays and the JDK provides no way to do it easily. This is
> why we first borrowed SorterTemplate[2] from CGlib: provided the
> definition of some primitive methods (compare and swap), it can sort
> any sub-sequence of any random-access data-structure.
>
> We recently faced the need to implement TimSort (LUCENE-4839[3]) for
> some use-cases, but unfortunately it requires a different set of
> primitive operations to be properly implemented. This is why I started
> working on LUCENE-4946[1] to be able to have different primitive
> operations depending on the sorting algorithm.
>
> A few more notes about this sorting abstraction:
>  - since it only assumes the data supports random-access, it can be
> used to sort off-heap or commons-primitives collections,
>  - it provides some useful sorting algorithms, in particular
> introsort[4] and a modified Timsort[5],
>  - performance is comparable to the JDK's Arrays.sort in spite of the
> abstraction,
>  - it is faster and more memory-efficient than Collections.sort for
> random-access lists, since the latter dumps the content of the list
> into an array to sort it with Arrays.sort.
>
> What do you think?
>
> [1] https://issues.apache.org/jira/browse/LUCENE-4946
> [2]
> http://grepcode.com/file/repo1.maven.org/maven2/cglib/cglib/2.2.2/net/sf/cglib/util/SorterTemplate.java#SorterTemplate
> [3] https://issues.apache.org/jira/browse/LUCENE-4839
> [4] http://fr.wikipedia.org/wiki/Introsort
> [5] http://en.wikipedia.org/wiki/Timsort
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message