On 6/30/14, 12:33 PM, Luc Maisonobe wrote:
> Le 30/06/2014 19:15, Phil Steitz a écrit :
>>
>>> On Jun 30, 2014, at 9:59 AM, Gilles <gilles@harfang.homelinux.org>
>>> wrote:
>>>
>>> On Mon, 30 Jun 2014 09:08:00 0700, Phil Steitz wrote:
>>>>> On Jun 29, 2014, at 3:41 PM, Gilles
>>>>> <gilles@harfang.homelinux.org> wrote:
>>>>>
>>>>>>> On Sun, 29 Jun 2014 14:39:51 0700, Phil Steitz wrote:
>>>>>>>> On 6/29/14, 2:30 PM, Gilles wrote:
>>>>>>>>> On Sun, 29 Jun 2014 10:25:58 0700, Phil Steitz wrote:
>>>>>>>>> On 6/29/14, 9:48 AM, venkatesha murthy wrote: On Sun,
>>>>>>>>> Jun 29, 2014 at 1:25 AM, Phil Steitz
>>>>>>>>> <phil.steitz@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>>> On 6/25/14, 12:12 AM, Luc Maisonobe wrote: Le
>>>>>>>>>>> 25/06/2014 06:05, venkatesha murthy a écrit
:
>>>>>>>>>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe
>>>>>>>>>>>> <luc@spaceroots.org>
>>>>>>>>>> wrote:
>>>>>>>>>>>>> Hi Venkat,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Le 23/06/2014 21:08, venkatesha murthy
a écrit
>>>>>>>>>>>>> :
>>>>>>>>>>>>>> On Tue, Jun 24, 2014 at 12:08 AM,
Luc
>>>>>>>>>>>>>> Maisonobe <luc@spaceroots.org>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> While looking further in Percentile
class
>>>>>>>>>>>>>>> for MATH1120, I have found another
problem
>>>>>>>>>>>>>>> in the current implementation.
>>>>>>>>>>>>>>> NaNStrategy.FIXED
>>>>>>>>>> should
>>>>>>>>>>>>>>> leave the NaNs in place, but
at the end of
>>>>>>>>>>>>>>> the KthSelector.select method,
a call to
>>>>>>>>>>>>>>> Arrays.sort moves the NaNs to
the end of
>>>>>>>>>>>>>>> the small subarray. What is
really
>>>>>>>>>>>>>>> implemented here is therefore
closer to
>>>>>>>>>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED.
>>>>>>>>>>>>>>> This always occur in the test
cases because
>>>>>>>>>>>>>>> they use very short arrays, and
we
>>>>>>>>>>>>>>> directly
>>>>>>>>>> switch to
>>>>>>>>>>>>>>> this part of the select method.
>>>>>>>>>>>>>> Are NaNs considered higher than +Inf
? If
>>>>>>>>>>>>>> MAXIMAL represent replacing for +inf
; you
>>>>>>>>>>>>>> need something to indicate beyond
this for
>>>>>>>>>>>>>> NaN.
>>>>>>>>>>>>> Well, we can just keep the NaN themselves
and
>>>>>>>>>>>>> handled them appropriately, hoping not
to trash
>>>>>>>>>>>>> performances too much.
>>>>>>>>>>>> Agreed.
>>>>>>>>>>>>
>>>>>>>>>>>>>> What is the test input you see an
issue and
>>>>>>>>>>>>>> what is the actual error you are
seeing.
>>>>>>>>>>>>>> Please share the test case.
>>>>>>>>>>>>> Just look at
>>>>>>>>>>>>> PercentileTest.testReplaceNanInRange().
The
>>>>>>>>>>>>> first check in the test corresponds to
a
>>>>>>>>>>>>> Percentile configuration at 50% percentile,
and
>>>>>>>>>>>>> NaNStrategy.FIXED. The array has an odd
number
>>>>>>>>>>>>> of entries, so the 50% percentile is
exactly
>>>>>>>>>>>>> one of the entries: the one at index
5 in the
>>>>>>>>>>>>> final array.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The initial ordering is { 0, 1, 2, 3,
4, NaN,
>>>>>>>>>>>>> NaN, 5, 7, NaN, 8 }. So for the
>>>>>>>>>>>>> NaNStrategy.FIXED setting, it should
not be
>>>>>>>>>>>>> modified at all in the selection algorithm
and
>>>>>>>>>>>>> the result for 50% should be the first
NaN of
>>>>>>>>>>>>> the array, at index 5. In fact, due to
the
>>>>>>>>>>>>> Arrays.sort, we *do* reorder the array
into {
>>>>>>>>>>>>> 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN
}, so the
>>>>>>>>>>>>> result is 5.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Agreed. just verified by putting back
the
>>>>>>>>>>>>> earlier insertionSort
>>>>>>>>>> function.
>>>>>>>>>>>>> If we use NaNStrategy.MAXIMAL and any
quantile
>>>>>>>>>>>>> above 67%, we get as a result
>>>>>>>>>>>>> Double.POSITIVE_INFINITY instead of
>>>>>>>>>>>>> Double.NaN.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If we agree to leave FIXED as unchanged
>>>>>>>>>>>>> behaviour with your
>>>>>>>>>> insertionSort
>>>>>>>>>>>> code; then atleast MAXIMAL/MINIMAL should
be
>>>>>>>>>>>> allowed for transformation
>>>>>>>>>> of
>>>>>>>>>>>> NaN to +/Inf
>>>>>>>>>>> I'm fine with it, as long as we clearly state
it in
>>>>>>>>>>> the NaNStrategy documentation, saying somethig
>>>>>>>>>>> like:
>>>>>>>>>>>
>>>>>>>>>>> When MAXIMAL/MINIMAL are used, the NaNs are
>>>>>>>>>>> effecively *replaced* by +/ infinity, hence
the
>>>>>>>>>>> results will be computed as if infinities were
>>>>>>>>>>> there in the first place.
>>>>>>>>>> 1  this is mixing concerns. NaNStrategy exists
for
>>>>>>>>>> one specific purpose  to turn extend the ordering
on
>>>>>>>>>> doubles a total ordering. Strategies prescribe only
>>>>>>>>>> how NaNs are to be treated in the ordering. Side
>>>>>>>>>> effects on the input array don't make sense in this
>>>>>>>>>> context. The use case for which this class was
>>>>>>>>>> created was rank transformations. Returning
>>>>>>>>>> infinities in rank transformations would blow up
>>>>>>>>>> computations in these classes. If a floating point
>>>>>>>>>> computations need to transform NaNs, the specific
>>>>>>>>>> stats / other classes that do this transformation
>>>>>>>>>> should perform and document the behavior.
>>>>>>>>>>
>>>>>>>>>> Phil
>>>>>>>>> OK. Agreed realized it later. Hence i have not
>>>>>>>>> touched NaNStrategy in my patch(MATH1132) . Now i have
>>>>>>>>> added a separate transformer to transform NaNs but it
>>>>>>>>> uses NanStrategy. Please let me know on this as this
>>>>>>>>> trasnformation itself can be used in different places.
>>>>>>>> Honestly, I don't see the value of this. I would be more
>>>>>>>> favorable to an addition to MathArrays supporting NaN (or
>>>>>>>> infinity) filtering / transformation. Several years back
>>>>>>>> we made the decision to just let NaNs propagate in
>>>>>>>> computations, which basically means that if you feed NaNs
>>>>>>>> to [math] you are going to get NaNs out in results. If we
>>>>>>>> do get into the business of filtering NaNs (or infinities
>>>>>>>> for that matter), I think it is probably best to do that
>>>>>>>> in MathArrays or somesuch  i.e., don't decorate APIs
>>>>>>>> with NaN or infinityhandling handling descriptions.
>>>>>>>> That would be a huge task and would likely end up
>>>>>>>> bleeding implementation detail into the APIs. As I said
>>>>>>>> above, NaNStrategy has to exist for rank transformations
>>>>>>>> to be welldefined. NaNhandling in floating point
>>>>>>>> computations is defined and as long as we fully document
>>>>>>>> what our algorithms do, I think it is fair enough to "let
>>>>>>>> the NaNs fall where they may"  which basically means
>>>>>>>> users need to do the filtering / transformation
>>>>>>>> themselves.
>>>>>>> So, do I understand correctly that it's OK to add the
>>>>>>> "remove" and "replace" utility methods (cf. MATH1130) in
>>>>>>> "MathArrays"? Or does this thread conclude that we don't
>>>>>>> have a use for this within Commons Math?
>>>>>> You raise a good point here. Personally, I don't see an
>>>>>> internal use and if we are sticking with MathArrays including
>>>>>> only things we have internal use for, I would say no, don't
>>>>>> include them.
>>>>> A third way to look at it would be that since some algorithms
>>>>> require being passed "clean" data (e.g. without NaNs), we could
>>>>> provide some simple means to do so. A user could then write,
>>>>> e.g.  double[] data = { 1, Double.NaN, 3, 6, 5 };
>>>>>
>>>>> Percentile p = new Percentile(); double q =
>>>>> p.evaluate(MathArrays.replace(data, Double.NaN,
>>>>> Double.POSITIVE_INFINITY), 0.1); 
>>>>>
>>>>> Then, if we provide that utility, is it still necessary to
>>>>> complicate the Percentile code with a NaNStrategy parameter?
>>>> I may be missing something, but it seems to me that percentile
>>>> does need a NaNStrategy in the presence if NaNs like the other
>>>> order statistics do (basically to define their position In the
>>>> ordering). It doesn't logically need more than that. I would
>>>> be ok defaulting the strategy or even returning NaN in the
>>>> presence of NaNs (or whenever interpolation involves a NaN). All
>>>> of this can be documented.
>>> I am _surely_ missing something because I don't figure out how NaNs
>>> can be useful when computing those statistics... :}
>> Good point. Basically the different strategies allow you to
>> effectively ignore or work around them so stats can be welldefined.
>> Without NaNStrategy, order statistics including NaNs are undefined.
> The problem with Percentile is that the elements are both reordered
> *and* used when interpolating between two values. Tor the rank methods,
> only reordering was needed so the various NaNStrategies could be
> implemented by replacing at will with +/ infinity and a classical sort
> would do the ordering for us.
>
> Here, we do interpolate on the values, so if we first replace NaN with
> +infinity to put them at the end, we may interpolate between a finite
> value and +inf and have +inf as a result, whereas if the NaN "value" was
> preserved and some magic used to sort the NaN, we would end up
> interpolating with a NaN and get a NaN as a result. Setting up the magic
> for MINIMAL and MAXIMAL seems achievable withing the sort, as we already
> have a specific sort (in fact not a sort but a selection algorithm,
> which is the core of the class). Setting up the magic for FIXED seems
> more difficult.
>
> So I guess implementing NaNStrategies may be done in different ways
> depending on how they will be used by the calling algorithm. I don't yet
> know if we can set up a general method to be used for all statistics.
> Looking only at Percentile, replacement with +/ infinity already seems
> to not be an appropriate solution.
I don't think we should try to redefine algorithms so they can
handle NaNs. When you do floating point computations with NaNs, you
get NaNs. That should be understood to be the contract. For rank
statistics, such as Percentile, we need to extend the ordering on
doubles to include them in ordering for the algorithm to be
welldefined. If we subsequently do floating point computations
with the NaNs, the result should be NaN. I think we should either
just simplify things and say we *always* return NaN whenever NaNs
are present for Percentile, or do the following:
0) use the prescribed NaN strategy to determine the ordering  so
NaNs end up either at the bottom, at the top or in their original
position (FIXED).
1) if interpolation involves a NaN element, the result returned is NaN.
Phil
>
> best regards,
> Luc
>
>>
>> Phil
>>
>>>> I still don't see a withinmath use for recoding methods; though
>>>> I do see the value from a user perspective. The problem is if we
>>>> open this door, we risk MathArrays becoming a mini commons lang.
>>> If Commons Lang already provides a feature, then we indeed don't
>>> need to offer the same within CM (if there is no internal use). Can
>>> someone confirm?
>>>
>>>
>>> Otherwise we could define a new inner interface in MathArrays,
>>> similar to "MathArrays.Function":  public class MathArrays {
>>>
>>> // ...
>>>
>>> public interface Transformer { /** Transform the whole array */
>>> double[] transform(double[] input); /** Transform the array in the
>>> interval [from, to) and return the whole array */ double[]
>>> transform(double[] input, int from, int to); /** Transform the
>>> array in the interval [from, to) and return the transformed
>>> subarray */ double[] transformAndSplice(double[] input, int from,
>>> int to); } } 
>>>
>>> Then we can provide "replace" and "remove" transformers through
>>> factory methods (example/untested code):  public class
>>> MathArrays {
>>>
>>> // ...
>>>
>>> public static Transformer createReplaceTransformer(final double
>>> old, final double replace) { return new Transformer() { public
>>> double[] transform(double[] input) { final int len = input.length;
>>> final double[] output = new double[len]; for (int i = 0; i < len;
>>> i++) { output[i] = Precision.equalsIncludingNaN(input[i], old) ?
>>> replace : input[i]; } return output; }
>>>
>>> // etc. }; } } 
>>>
>>>
>>> Gilles
>>>
>>>
>>> 
>>>
>>>
> To unsubscribe, email: devunsubscribe@commons.apache.org
>>> For additional commands, email: devhelp@commons.apache.org
>>>
>> 
>>
>>
> To unsubscribe, email: devunsubscribe@commons.apache.org
>> For additional commands, email: devhelp@commons.apache.org
>>
>>
>
> 
> To unsubscribe, email: devunsubscribe@commons.apache.org
> For additional commands, email: devhelp@commons.apache.org
>
>

To unsubscribe, email: devunsubscribe@commons.apache.org
For additional commands, email: devhelp@commons.apache.org
