commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilles <gil...@harfang.homelinux.org>
Subject Re: [math] use case for NaNStrategy.FIXED?
Date Sun, 29 Jun 2014 22:41:30 GMT
On Sun, 29 Jun 2014 14:39:51 -0700, Phil Steitz wrote:
> On 6/29/14, 2:30 PM, Gilles wrote:
>> On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote:
>>> On 6/29/14, 9:48 AM, venkatesha murthy wrote:
>>>> On Sun, Jun 29, 2014 at 1:25 AM, Phil Steitz
>>>> <phil.steitz@gmail.com> wrote:
>>>>
>>>>> On 6/25/14, 12:12 AM, Luc Maisonobe wrote:
>>>>>> Le 25/06/2014 06:05, venkatesha murthy a écrit :
>>>>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe
>>>>>>> <luc@spaceroots.org>
>>>>> wrote:
>>>>>>>> Hi Venkat,
>>>>>>>>
>>>>>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit :
>>>>>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe
>>>>>>>>> <luc@spaceroots.org>
>>>>>>>> wrote:
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> While looking further in Percentile class for MATH-1120,
I
>>>>>>>>>> have found
>>>>>>>>>> another problem in the current implementation.
>>>>>>>>>> NaNStrategy.FIXED
>>>>> should
>>>>>>>>>> leave the NaNs in place, but at the end of the
>>>>>>>>>> KthSelector.select
>>>>>>>>>> method, a call to Arrays.sort moves the NaNs to the
end of
>>>>>>>>>> the small
>>>>>>>>>> sub-array. What is really implemented here is therefore
>>>>>>>>>> closer to
>>>>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This
always
>>>>>>>>>> occur in the
>>>>>>>>>> test cases because they use very short arrays, and
we
>>>>>>>>>> directly
>>>>> switch to
>>>>>>>>>> this part of the select method.
>>>>>>>>> Are NaNs considered higher than +Inf ?
>>>>>>>>> If MAXIMAL represent replacing for +inf ; you need
>>>>>>>>> something to
>>>>>>>>> indicate beyond this for NaN.
>>>>>>>> Well, we can just keep the NaN themselves and handled them
>>>>>>>> appropriately, hoping not to trash performances too much.
>>>>>>>>
>>>>>>> Agreed.
>>>>>>>
>>>>>>>>> What is the test input you see an issue and what is the
>>>>>>>>> actual error
>>>>>>>>> you are seeing. Please share the test case.
>>>>>>>> Just look at PercentileTest.testReplaceNanInRange(). The
>>>>>>>> first check in
>>>>>>>> the test corresponds to a Percentile configuration at 50%
>>>>>>>> percentile,
>>>>>>>> and NaNStrategy.FIXED. The array has an odd number of
>>>>>>>> entries, so the
>>>>>>>> 50% percentile is exactly one of the entries: the one at
>>>>>>>> index 5 in the
>>>>>>>> final array.
>>>>>>>>
>>>>>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7,
>>>>>>>> NaN, 8 }. So
>>>>>>>> for the NaNStrategy.FIXED setting, it should not be modified
>>>>>>>> at all in
>>>>>>>> the selection algorithm and the result for 50% should be
the
>>>>>>>> first NaN
>>>>>>>> of the array, at index 5. In fact, due to the Arrays.sort,
>>>>>>>> we *do*
>>>>>>>> reorder the array into  { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN,
>>>>>>>> NaN }, so
>>>>>>>> the result is 5.
>>>>>>>>
>>>>>>>> Agreed. just verified by putting back the earlier 
>>>>>>>> insertionSort
>>>>> function.
>>>>>>>> If we use NaNStrategy.MAXIMAL and any quantile above 67%,
we
>>>>>>>> get as a
>>>>>>>> result Double.POSITIVE_INFINITY instead of Double.NaN.
>>>>>>>>
>>>>>>>> If we agree to leave FIXED as unchanged behaviour with your
>>>>> insertionSort
>>>>>>> code; then atleast MAXIMAL/MINIMAL should be allowed for
>>>>>>> transformation
>>>>> of
>>>>>>> NaN to +/-Inf
>>>>>> I'm fine with it, as long as we clearly state it in the
>>>>>> NaNStrategy
>>>>>> documentation, saying somethig like:
>>>>>>
>>>>>>  When MAXIMAL/MINIMAL are used, the NaNs are effecively
>>>>>> *replaced* by
>>>>>>  +/- infinity, hence the results will be computed as if
>>>>>> infinities were
>>>>>>  there in the first place.
>>>>> -1 - this is mixing concerns.  NaNStrategy exists for one 
>>>>> specific
>>>>> purpose - to turn extend the ordering on doubles a total 
>>>>> ordering.
>>>>> Strategies prescribe only how NaNs are to be treated in the
>>>>> ordering.  Side effects on the input array don't make sense in
>>>>> this
>>>>> context.  The use case for which this class was created was rank
>>>>> transformations.  Returning infinities in rank transformations
>>>>> would
>>>>> blow up computations in these classes.  If a floating point
>>>>> computations need to transform NaNs, the specific stats / other
>>>>> classes that do this transformation should perform and document
>>>>> the
>>>>> behavior.
>>>>>
>>>>> Phil
>>>>>
>>>> OK. Agreed realized it later.  Hence i have not touched
>>>> NaNStrategy in my
>>>> patch(MATH-1132) . Now i have added a separate transformer to
>>>> transform NaNs
>>>> but it uses NanStrategy.  Please let me know on this as this
>>>> trasnformation
>>>> itself
>>>> can be used in different places.
>>>
>>> Honestly, I don't see the value of this.  I would be more favorable
>>> to an addition to MathArrays supporting NaN (or infinity) filtering
>>> / transformation.  Several years back we made the decision to just
>>> let NaNs propagate in computations, which basically means that if
>>> you feed NaNs to [math] you are going to get NaNs out in results.
>>> If we do get into the business of filtering NaNs (or infinities for
>>> that matter), I think it is probably best to do that in MathArrays
>>> or somesuch - i.e., don't decorate APIs with NaN or
>>> infinity-handling handling descriptions.  That would be a huge task
>>> and would likely end up bleeding implementation detail into the
>>> APIs.  As I said above, NaNStrategy has to exist for rank
>>> transformations to be well-defined.  NaN-handling in floating point
>>> computations is defined and as long as we fully document what our
>>> algorithms do, I think it is fair enough to "let the NaNs fall 
>>> where
>>> they may" - which basically means users need to do the filtering /
>>> transformation themselves.
>>
>> So, do I understand correctly that it's OK to add the "remove" and
>> "replace" utility methods (cf. MATH-1130) in "MathArrays"?
>> Or does this thread conclude that we don't have a use for this 
>> within
>> Commons Math?
>
> You raise a good point here.  Personally, I don't see an internal
> use and if we are sticking with MathArrays including only things we
> have internal use for, I would say no, don't include them.

A third way to look at it would be that since some algorithms require
being passed "clean" data (e.g. without NaNs), we could provide some
simple means to do so. A user could then write, e.g.
---
double[] data = { 1, Double.NaN, 3, 6, 5 };

Percentile p = new Percentile();
double q = p.evaluate(MathArrays.replace(data, Double.NaN, 
Double.POSITIVE_INFINITY), 0.1);
---

Then, if we provide that utility, is it still necessary to complicate 
the Percentile
code with a NaNStrategy parameter?


Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message