commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilles <gil...@harfang.homelinux.org>
Subject Re: [math] use case for NaNStrategy.FIXED?
Date Sun, 29 Jun 2014 21:30:34 GMT
On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote:
> On 6/29/14, 9:48 AM, venkatesha murthy wrote:
>> On Sun, Jun 29, 2014 at 1:25 AM, Phil Steitz <phil.steitz@gmail.com> 
>> wrote:
>>
>>> On 6/25/14, 12:12 AM, Luc Maisonobe wrote:
>>>> Le 25/06/2014 06:05, venkatesha murthy a écrit :
>>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe 
>>>>> <luc@spaceroots.org>
>>> wrote:
>>>>>> Hi Venkat,
>>>>>>
>>>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit :
>>>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe 
>>>>>>> <luc@spaceroots.org>
>>>>>> wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> While looking further in Percentile class for MATH-1120,
I 
>>>>>>>> have found
>>>>>>>> another problem in the current implementation. 
>>>>>>>> NaNStrategy.FIXED
>>> should
>>>>>>>> leave the NaNs in place, but at the end of the 
>>>>>>>> KthSelector.select
>>>>>>>> method, a call to Arrays.sort moves the NaNs to the end of
the 
>>>>>>>> small
>>>>>>>> sub-array. What is really implemented here is therefore closer

>>>>>>>> to
>>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always occur

>>>>>>>> in the
>>>>>>>> test cases because they use very short arrays, and we directly
>>> switch to
>>>>>>>> this part of the select method.
>>>>>>> Are NaNs considered higher than +Inf ?
>>>>>>> If MAXIMAL represent replacing for +inf ; you need something
to
>>>>>>> indicate beyond this for NaN.
>>>>>> Well, we can just keep the NaN themselves and handled them
>>>>>> appropriately, hoping not to trash performances too much.
>>>>>>
>>>>> Agreed.
>>>>>
>>>>>>> What is the test input you see an issue and what is the actual

>>>>>>> error
>>>>>>> you are seeing. Please share the test case.
>>>>>> Just look at PercentileTest.testReplaceNanInRange(). The first 
>>>>>> check in
>>>>>> the test corresponds to a Percentile configuration at 50% 
>>>>>> percentile,
>>>>>> and NaNStrategy.FIXED. The array has an odd number of entries, 
>>>>>> so the
>>>>>> 50% percentile is exactly one of the entries: the one at index 5

>>>>>> in the
>>>>>> final array.
>>>>>>
>>>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7, NaN, 8 
>>>>>> }. So
>>>>>> for the NaNStrategy.FIXED setting, it should not be modified at 
>>>>>> all in
>>>>>> the selection algorithm and the result for 50% should be the 
>>>>>> first NaN
>>>>>> of the array, at index 5. In fact, due to the Arrays.sort, we 
>>>>>> *do*
>>>>>> reorder the array into  { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN 
>>>>>> }, so
>>>>>> the result is 5.
>>>>>>
>>>>>> Agreed. just verified by putting back the earlier insertionSort
>>> function.
>>>>>> If we use NaNStrategy.MAXIMAL and any quantile above 67%, we get

>>>>>> as a
>>>>>> result Double.POSITIVE_INFINITY instead of Double.NaN.
>>>>>>
>>>>>> If we agree to leave FIXED as unchanged behaviour with your
>>> insertionSort
>>>>> code; then atleast MAXIMAL/MINIMAL should be allowed for 
>>>>> transformation
>>> of
>>>>> NaN to +/-Inf
>>>> I'm fine with it, as long as we clearly state it in the 
>>>> NaNStrategy
>>>> documentation, saying somethig like:
>>>>
>>>>  When MAXIMAL/MINIMAL are used, the NaNs are effecively *replaced* 
>>>> by
>>>>  +/- infinity, hence the results will be computed as if infinities 
>>>> were
>>>>  there in the first place.
>>> -1 - this is mixing concerns.  NaNStrategy exists for one specific
>>> purpose - to turn extend the ordering on doubles a total ordering.
>>> Strategies prescribe only how NaNs are to be treated in the
>>> ordering.  Side effects on the input array don't make sense in this
>>> context.  The use case for which this class was created was rank
>>> transformations.  Returning infinities in rank transformations 
>>> would
>>> blow up computations in these classes.  If a floating point
>>> computations need to transform NaNs, the specific stats / other
>>> classes that do this transformation should perform and document the
>>> behavior.
>>>
>>> Phil
>>>
>> OK. Agreed realized it later.  Hence i have not touched NaNStrategy 
>> in my
>> patch(MATH-1132) . Now i have added a separate transformer to 
>> transform NaNs
>> but it uses NanStrategy.  Please let me know on this as this 
>> trasnformation
>> itself
>> can be used in different places.
>
> Honestly, I don't see the value of this.  I would be more favorable
> to an addition to MathArrays supporting NaN (or infinity) filtering
> / transformation.  Several years back we made the decision to just
> let NaNs propagate in computations, which basically means that if
> you feed NaNs to [math] you are going to get NaNs out in results.
> If we do get into the business of filtering NaNs (or infinities for
> that matter), I think it is probably best to do that in MathArrays
> or somesuch - i.e., don't decorate APIs with NaN or
> infinity-handling handling descriptions.  That would be a huge task
> and would likely end up bleeding implementation detail into the
> APIs.  As I said above, NaNStrategy has to exist for rank
> transformations to be well-defined.  NaN-handling in floating point
> computations is defined and as long as we fully document what our
> algorithms do, I think it is fair enough to "let the NaNs fall where
> they may" - which basically means users need to do the filtering /
> transformation themselves.

So, do I understand correctly that it's OK to add the "remove" and
"replace" utility methods (cf. MATH-1130) in "MathArrays"?
Or does this thread conclude that we don't have a use for this within
Commons Math?

Gilles

>
> Phil
>>
>>>> [...]


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message