commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From venkatesha murthy <venkateshamurth...@gmail.com>
Subject Re: [math] use case for NaNStrategy.FIXED?
Date Mon, 30 Jun 2014 13:18:02 GMT
On Mon, Jun 30, 2014 at 4:11 AM, Gilles <gilles@harfang.homelinux.org>
wrote:

> On Sun, 29 Jun 2014 14:39:51 -0700, Phil Steitz wrote:
>
>> On 6/29/14, 2:30 PM, Gilles wrote:
>>
>>> On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote:
>>>
>>>> On 6/29/14, 9:48 AM, venkatesha murthy wrote:
>>>>
>>>>> On Sun, Jun 29, 2014 at 1:25 AM, Phil Steitz
>>>>> <phil.steitz@gmail.com> wrote:
>>>>>
>>>>>  On 6/25/14, 12:12 AM, Luc Maisonobe wrote:
>>>>>>
>>>>>>> Le 25/06/2014 06:05, venkatesha murthy a écrit :
>>>>>>>
>>>>>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe
>>>>>>>> <luc@spaceroots.org>
>>>>>>>>
>>>>>>> wrote:
>>>>>>
>>>>>>> Hi Venkat,
>>>>>>>>>
>>>>>>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit :
>>>>>>>>>
>>>>>>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe
>>>>>>>>>> <luc@spaceroots.org>
>>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> While looking further in Percentile class for
MATH-1120, I
>>>>>>>>>>> have found
>>>>>>>>>>> another problem in the current implementation.
>>>>>>>>>>> NaNStrategy.FIXED
>>>>>>>>>>>
>>>>>>>>>> should
>>>>>>
>>>>>>> leave the NaNs in place, but at the end of the
>>>>>>>>>>> KthSelector.select
>>>>>>>>>>> method, a call to Arrays.sort moves the NaNs
to the end of
>>>>>>>>>>> the small
>>>>>>>>>>> sub-array. What is really implemented here is
therefore
>>>>>>>>>>> closer to
>>>>>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This
always
>>>>>>>>>>> occur in the
>>>>>>>>>>> test cases because they use very short arrays,
and we
>>>>>>>>>>> directly
>>>>>>>>>>>
>>>>>>>>>> switch to
>>>>>>
>>>>>>> this part of the select method.
>>>>>>>>>>>
>>>>>>>>>> Are NaNs considered higher than +Inf ?
>>>>>>>>>> If MAXIMAL represent replacing for +inf ; you need
>>>>>>>>>> something to
>>>>>>>>>> indicate beyond this for NaN.
>>>>>>>>>>
>>>>>>>>> Well, we can just keep the NaN themselves and handled
them
>>>>>>>>> appropriately, hoping not to trash performances too much.
>>>>>>>>>
>>>>>>>>>  Agreed.
>>>>>>>>
>>>>>>>>  What is the test input you see an issue and what is the
>>>>>>>>>> actual error
>>>>>>>>>> you are seeing. Please share the test case.
>>>>>>>>>>
>>>>>>>>> Just look at PercentileTest.testReplaceNanInRange().
The
>>>>>>>>> first check in
>>>>>>>>> the test corresponds to a Percentile configuration at
50%
>>>>>>>>> percentile,
>>>>>>>>> and NaNStrategy.FIXED. The array has an odd number of
>>>>>>>>> entries, so the
>>>>>>>>> 50% percentile is exactly one of the entries: the one
at
>>>>>>>>> index 5 in the
>>>>>>>>> final array.
>>>>>>>>>
>>>>>>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5,
7,
>>>>>>>>> NaN, 8 }. So
>>>>>>>>> for the NaNStrategy.FIXED setting, it should not be modified
>>>>>>>>> at all in
>>>>>>>>> the selection algorithm and the result for 50% should
be the
>>>>>>>>> first NaN
>>>>>>>>> of the array, at index 5. In fact, due to the Arrays.sort,
>>>>>>>>> we *do*
>>>>>>>>> reorder the array into  { 0, 1, 2, 3, 4, 5, 7, 8, NaN,
NaN,
>>>>>>>>> NaN }, so
>>>>>>>>> the result is 5.
>>>>>>>>>
>>>>>>>>> Agreed. just verified by putting back the earlier insertionSort
>>>>>>>>>
>>>>>>>> function.
>>>>>>
>>>>>>> If we use NaNStrategy.MAXIMAL and any quantile above 67%, we
>>>>>>>>> get as a
>>>>>>>>> result Double.POSITIVE_INFINITY instead of Double.NaN.
>>>>>>>>>
>>>>>>>>> If we agree to leave FIXED as unchanged behaviour with
your
>>>>>>>>>
>>>>>>>> insertionSort
>>>>>>
>>>>>>> code; then atleast MAXIMAL/MINIMAL should be allowed for
>>>>>>>> transformation
>>>>>>>>
>>>>>>> of
>>>>>>
>>>>>>> NaN to +/-Inf
>>>>>>>>
>>>>>>> I'm fine with it, as long as we clearly state it in the
>>>>>>> NaNStrategy
>>>>>>> documentation, saying somethig like:
>>>>>>>
>>>>>>>  When MAXIMAL/MINIMAL are used, the NaNs are effecively
>>>>>>> *replaced* by
>>>>>>>  +/- infinity, hence the results will be computed as if
>>>>>>> infinities were
>>>>>>>  there in the first place.
>>>>>>>
>>>>>> -1 - this is mixing concerns.  NaNStrategy exists for one specific
>>>>>> purpose - to turn extend the ordering on doubles a total ordering.
>>>>>> Strategies prescribe only how NaNs are to be treated in the
>>>>>> ordering.  Side effects on the input array don't make sense in
>>>>>> this
>>>>>> context.  The use case for which this class was created was rank
>>>>>> transformations.  Returning infinities in rank transformations
>>>>>> would
>>>>>> blow up computations in these classes.  If a floating point
>>>>>> computations need to transform NaNs, the specific stats / other
>>>>>> classes that do this transformation should perform and document
>>>>>> the
>>>>>> behavior.
>>>>>>
>>>>>> Phil
>>>>>>
>>>>>>  OK. Agreed realized it later.  Hence i have not touched
>>>>> NaNStrategy in my
>>>>> patch(MATH-1132) . Now i have added a separate transformer to
>>>>> transform NaNs
>>>>> but it uses NanStrategy.  Please let me know on this as this
>>>>> trasnformation
>>>>> itself
>>>>> can be used in different places.
>>>>>
>>>>
>>>> Honestly, I don't see the value of this.  I would be more favorable
>>>> to an addition to MathArrays supporting NaN (or infinity) filtering
>>>> / transformation.  Several years back we made the decision to just
>>>> let NaNs propagate in computations, which basically means that if
>>>> you feed NaNs to [math] you are going to get NaNs out in results.
>>>> If we do get into the business of filtering NaNs (or infinities for
>>>> that matter), I think it is probably best to do that in MathArrays
>>>> or somesuch - i.e., don't decorate APIs with NaN or
>>>> infinity-handling handling descriptions.  That would be a huge task
>>>> and would likely end up bleeding implementation detail into the
>>>> APIs.  As I said above, NaNStrategy has to exist for rank
>>>> transformations to be well-defined.  NaN-handling in floating point
>>>> computations is defined and as long as we fully document what our
>>>> algorithms do, I think it is fair enough to "let the NaNs fall where
>>>> they may" - which basically means users need to do the filtering /
>>>> transformation themselves.
>>>>
>>>
>>> So, do I understand correctly that it's OK to add the "remove" and
>>> "replace" utility methods (cf. MATH-1130) in "MathArrays"?
>>> Or does this thread conclude that we don't have a use for this within
>>> Commons Math?
>>>
>>
>> You raise a good point here.  Personally, I don't see an internal
>> use and if we are sticking with MathArrays including only things we
>> have internal use for, I would say no, don't include them.
>>
>
I feel IMO, replace and remove are too generic to be holed in Percentile
and see that they can have uses similar to copy or equals or any other
MathArrays function..
(if not MathArrays; then can DoubleArray be an alternate home for these?).

>
> A third way to look at it would be that since some algorithms require
> being passed "clean" data (e.g. without NaNs), we could provide some
> simple means to do so. A user could then write, e.g.
> ---
> double[] data = { 1, Double.NaN, 3, 6, 5 };
>
> Percentile p = new Percentile();
> double q = p.evaluate(MathArrays.replace(data, Double.NaN,
> Double.POSITIVE_INFINITY), 0.1);
> ---
>
> Then, if we provide that utility, is it still necessary to complicate the
> Percentile
> code with a NaNStrategy parameter?
>
>
> Gilles
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message