commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From venkatesha murthy <venkateshamurth...@gmail.com>
Subject Re: [math] use case for NaNStrategy.FIXED?
Date Wed, 25 Jun 2014 04:05:04 GMT
On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe <luc@spaceroots.org> wrote:

> Hi Venkat,
>
> Le 23/06/2014 21:08, venkatesha murthy a écrit :
> > On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe <luc@spaceroots.org>
> wrote:
> >> Hi all,
> >>
> >> While looking further in Percentile class for MATH-1120, I have found
> >> another problem in the current implementation. NaNStrategy.FIXED should
> >> leave the NaNs in place, but at the end of the KthSelector.select
> >> method, a call to Arrays.sort moves the NaNs to the end of the small
> >> sub-array. What is really implemented here is therefore closer to
> >> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always occur in the
> >> test cases because they use very short arrays, and we directly switch to
> >> this part of the select method.
> > Are NaNs considered higher than +Inf ?
> > If MAXIMAL represent replacing for +inf ; you need something to
> > indicate beyond this for NaN.
>
> Well, we can just keep the NaN themselves and handled them
> appropriately, hoping not to trash performances too much.
>
Agreed.

>
> > What is the test input you see an issue and what is the actual error
> > you are seeing. Please share the test case.
>
> Just look at PercentileTest.testReplaceNanInRange(). The first check in
> the test corresponds to a Percentile configuration at 50% percentile,
> and NaNStrategy.FIXED. The array has an odd number of entries, so the
> 50% percentile is exactly one of the entries: the one at index 5 in the
> final array.
>
> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7, NaN, 8 }. So
> for the NaNStrategy.FIXED setting, it should not be modified at all in
> the selection algorithm and the result for 50% should be the first NaN
> of the array, at index 5. In fact, due to the Arrays.sort, we *do*
> reorder the array into  { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN }, so
> the result is 5.
>
> Agreed. just verified by putting back the earlier insertionSort function.


> If we use NaNStrategy.MAXIMAL and any quantile above 67%, we get as a
> result Double.POSITIVE_INFINITY instead of Double.NaN.
>
> If we agree to leave FIXED as unchanged behaviour with your insertionSort
code; then atleast MAXIMAL/MINIMAL should be allowed for transformation of
NaN to +/-Inf

> >>
> >> When I implemented the method, I explicitly avoided calling Arrays.sort
> >> because it is a general purpose method that is know to be efficient only
> >> for arrays of at least medium size. In most cases, when the array is
> >> small one falls back to a non-recursive method like a very simple
> >> insertion sort, which is faster for smaller arrays.
> >
> > Please help me understand here; even java primitive Arrays.sort does
> > an insertion sort for less than 7 elements
> > (Refer sort1(double x[], int off, int len))
> > So what is it that the custom insertion sort that does differently or
> > is advantageous. Is it the value 15 elements?
>
> I don't see a reference to 7 elements, neither in the Java6 nor in the
> Java 7 doc

Please take a look at the sort1 method where there is a first block in the
code which clearly mentions len < 7
    /**
     * Sorts the specified sub-array of doubles into ascending order.
     */
    private static void sort1(double x[], int off, int len) {
    // Insertion sort on smallest arrays
    if (len < 7) {
        for (int i=off; i<len+off; i++)
        for (int j=i; j>off && x[j-1]>x[j]; j--)
            swap(x, j, j-1);
        return;
    }
    :
    :
    : code continues for the else part

Also the grepcode url
<http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/Arrays.java#Arrays.sort1%28double[]%2Cint%2Cint%29>
indicates the same

(and in any case the doc explicitly states the algorithms
> explained are only implementation notes and are not part of the
> specification).
>
Yes its a part of comments anyways.

>
> However, the most important part for now is the fact that we control it
> and may be able to implement different NaN strategies. What we have
> currently fails.
>
> I agree on this  and hence here is my take:
Leave FIXED as-is and use the earlier insertionSort code (just change the
name to sort rather than hardcoding it as insertionsort) to handle the case
you were mentioning
Continue to use MAXIMAL/MINIMAL for +/-Inf transformation and that way we
have covered both Inf and nan cases.
Use REMOVED as default for all Percentile Estimation Types. (mostly
influenced by R here perhaps)

best regards,
> Luc
>
> >
> >> In the select
> >> operation, we know we have small sub-arrays at the call point. Going
> >> back to the former insertionSort would recover the good behavior for
> >> small arrays, but would in fact not be sufficient to really implement a
> >> NaNStrategy.FIXED. I guess it would be simple to make it behave like
> >> NaNStrategy.MAXIMAL but I did not try yet.
> >>
> >> My point here is rather: can we really and should we really implement
> >> NaNStrategy.FIXED? Looking at how it is used elsewhere, it needs to
> >> store the original position of the NaNs. It is quite cumbersome.
> >>
> >> I wonder what is the use case for NaNStrategy.FIXED at all.
> >>
> >> Going further and looking at the use cases for other NaNStrategy values,
> >> the NaNs are replaced by +/- infinity before sorting, which is OK for
> >> ranking as we only rely on the indices, but we use the values themselves
> >> in Percentile. So sometimes, we end up with computing interpolations
> >> like 0.5 * (x[k] + x[k+1]) or similar. If x[k] is a finite number and
> >> x[k+1] has been changed to +infinity, we get +infinity, instead of the
> >> NaN we should have retrieved without replacement. So here again, I'm not
> >> sure we are doing the right thing.
> >>
> >> What do you think?
> >>
> >> best regards,
> >> Luc
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >> For additional commands, e-mail: dev-help@commons.apache.org
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message