commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luc Maisonobe <...@spaceroots.org>
Subject Re: [Math] MATH-1129
Date Wed, 18 Jun 2014 13:02:41 GMT
Le 18/06/2014 14:32, Gilles a écrit :
> Hello Luc.
> 
>>>
>>> See
>>>   https://issues.apache.org/jira/browse/MATH-1129
>>>
>>> The problem reported was due to the sorting procedure not behaving
>>> correctly in the presence of NaN.
>>> This procedure could be replaced by an equivalent method from the JDK:
>>>   java.util.Arrays.sort(double[],int,int)
>>>
>>> Any objection?
>>
>> If you imply removing the select, medianOf3 and partition methods, yes I
>> would object. If you imply replacing only the insertionSort method used
>> for small sub-arrays, then I almost agree.
> 
> Issue 1129 concerns the latter: See the comment in "insertionSort" in the
> current code.
> 
> However, for the former, you should really have a look at MATH-1120
>   https://issues.apache.org/jira/browse/MATH-1120
> The proposed patch indeed touches those methods.

So I think it is worth fixing MATH-1120 first, with the NaNStrategy and
go back to MATH-1129 afterwards, to see if it still applies of if
MATH-1120 also fixed it.

I will have a look at the proposed patch, including your comments.

> 
>>
>> However, I'm not sure this would be sufficient as the sort is done
>> partially in all the methods above. The former ones are used to split
>> the big array into smaller ones and sorting only the sub-arrays that are
>> needed to compute the percentile (it is really a selection algorithm,
>> not a full-sort algorithm). So I guess we should look at all the methods
>> at once to ensure proper handling of NaNs. The trick is that all
>> comparisons involving NaN return false (<, <=, >, >=, ==, !=),
>> regardless of the NaN being at the left hand side or right hand side of
>> the comparison (so we get the funny consequence that if a is a NaN, the
>> test a == a evaluates to false ...).
> 
> As of r1603217, "insertionSort" seems to behave correctly (at least on the
> example reported in MATH-1129).

Yes, but it would be fed with garbage as the former methods fail
(typically medianOf3 does not handle NaNs properly), so I guess in the
general case it would clearly fail.

> 
>>
>> The fast an insertion sort is used at the end is because it is very
>> simply and efficient for small arrays, which is exactly what happens
>> here: the array part has MIN_SELECT_SIZE elements or less, i.e. 15
>> elements or less.
> 
> That was my question, implicitly: is it worth not using the JDK, or IOW,
> is "insertionSort" so much faster than "java.util.Arrays.sort" that it is
> worth maintaining a custom implementation?

Yes it is as long as it is kept simple. Quicksort is not that good at
small arrays, and this method can be called a huge number of times on
numerous small arrays, to it should really consider this.

> 
>>
>> As I wrote this selection algorithm, I could do the check if you want.
> 
> What check?

Checking the algorithms and fixing them using the strategy "NaN is
larger everything", as already stated in the javadoc. But as MATH-1120
proposes a more complete solution and allows other strategies, I'll look
at it first.

best regards,
Luc

> 
> 
> Regards,
> Gilles
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message