On 6/29/14, 9:48 AM, venkatesha murthy wrote:
> On Sun, Jun 29, 2014 at 1:25 AM, Phil Steitz <phil.steitz@gmail.com> wrote:
>
>> On 6/25/14, 12:12 AM, Luc Maisonobe wrote:
>>> Le 25/06/2014 06:05, venkatesha murthy a écrit :
>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe <luc@spaceroots.org>
>> wrote:
>>>>> Hi Venkat,
>>>>>
>>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit :
>>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe <luc@spaceroots.org>
>>>>> wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> While looking further in Percentile class for MATH1120, I have
found
>>>>>>> another problem in the current implementation. NaNStrategy.FIXED
>> should
>>>>>>> leave the NaNs in place, but at the end of the KthSelector.select
>>>>>>> method, a call to Arrays.sort moves the NaNs to the end of the
small
>>>>>>> subarray. What is really implemented here is therefore closer
to
>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always occur
in the
>>>>>>> test cases because they use very short arrays, and we directly
>> switch to
>>>>>>> this part of the select method.
>>>>>> Are NaNs considered higher than +Inf ?
>>>>>> If MAXIMAL represent replacing for +inf ; you need something to
>>>>>> indicate beyond this for NaN.
>>>>> Well, we can just keep the NaN themselves and handled them
>>>>> appropriately, hoping not to trash performances too much.
>>>>>
>>>> Agreed.
>>>>
>>>>>> What is the test input you see an issue and what is the actual error
>>>>>> you are seeing. Please share the test case.
>>>>> Just look at PercentileTest.testReplaceNanInRange(). The first check
in
>>>>> the test corresponds to a Percentile configuration at 50% percentile,
>>>>> and NaNStrategy.FIXED. The array has an odd number of entries, so the
>>>>> 50% percentile is exactly one of the entries: the one at index 5 in the
>>>>> final array.
>>>>>
>>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7, NaN, 8 }. So
>>>>> for the NaNStrategy.FIXED setting, it should not be modified at all in
>>>>> the selection algorithm and the result for 50% should be the first NaN
>>>>> of the array, at index 5. In fact, due to the Arrays.sort, we *do*
>>>>> reorder the array into { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN }, so
>>>>> the result is 5.
>>>>>
>>>>> Agreed. just verified by putting back the earlier insertionSort
>> function.
>>>>> If we use NaNStrategy.MAXIMAL and any quantile above 67%, we get as a
>>>>> result Double.POSITIVE_INFINITY instead of Double.NaN.
>>>>>
>>>>> If we agree to leave FIXED as unchanged behaviour with your
>> insertionSort
>>>> code; then atleast MAXIMAL/MINIMAL should be allowed for transformation
>> of
>>>> NaN to +/Inf
>>> I'm fine with it, as long as we clearly state it in the NaNStrategy
>>> documentation, saying somethig like:
>>>
>>> When MAXIMAL/MINIMAL are used, the NaNs are effecively *replaced* by
>>> +/ infinity, hence the results will be computed as if infinities were
>>> there in the first place.
>> 1  this is mixing concerns. NaNStrategy exists for one specific
>> purpose  to turn extend the ordering on doubles a total ordering.
>> Strategies prescribe only how NaNs are to be treated in the
>> ordering. Side effects on the input array don't make sense in this
>> context. The use case for which this class was created was rank
>> transformations. Returning infinities in rank transformations would
>> blow up computations in these classes. If a floating point
>> computations need to transform NaNs, the specific stats / other
>> classes that do this transformation should perform and document the
>> behavior.
>>
>> Phil
>>
> OK. Agreed realized it later. Hence i have not touched NaNStrategy in my
> patch(MATH1132) . Now i have added a separate transformer to transform NaNs
> but it uses NanStrategy. Please let me know on this as this trasnformation
> itself
> can be used in different places.
Honestly, I don't see the value of this. I would be more favorable
to an addition to MathArrays supporting NaN (or infinity) filtering
/ transformation. Several years back we made the decision to just
let NaNs propagate in computations, which basically means that if
you feed NaNs to [math] you are going to get NaNs out in results.
If we do get into the business of filtering NaNs (or infinities for
that matter), I think it is probably best to do that in MathArrays
or somesuch  i.e., don't decorate APIs with NaN or
infinityhandling handling descriptions. That would be a huge task
and would likely end up bleeding implementation detail into the
APIs. As I said above, NaNStrategy has to exist for rank
transformations to be welldefined. NaNhandling in floating point
computations is defined and as long as we fully document what our
algorithms do, I think it is fair enough to "let the NaNs fall where
they may"  which basically means users need to do the filtering /
transformation themselves.
Phil
>
>>>>>>> When I implemented the method, I explicitly avoided calling
>> Arrays.sort
>>>>>>> because it is a general purpose method that is know to be efficient
>> only
>>>>>>> for arrays of at least medium size. In most cases, when the array
is
>>>>>>> small one falls back to a nonrecursive method like a very simple
>>>>>>> insertion sort, which is faster for smaller arrays.
>>>>>> Please help me understand here; even java primitive Arrays.sort does
>>>>>> an insertion sort for less than 7 elements
>>>>>> (Refer sort1(double x[], int off, int len))
>>>>>> So what is it that the custom insertion sort that does differently
or
>>>>>> is advantageous. Is it the value 15 elements?
>>>>> I don't see a reference to 7 elements, neither in the Java6 nor in the
>>>>> Java 7 doc
>>>> Please take a look at the sort1 method where there is a first block in
>> the
>>>> code which clearly mentions len < 7
>>>> /**
>>>> * Sorts the specified subarray of doubles into ascending order.
>>>> */
>>>> private static void sort1(double x[], int off, int len) {
>>>> // Insertion sort on smallest arrays
>>>> if (len < 7) {
>>>> for (int i=off; i<len+off; i++)
>>>> for (int j=i; j>off && x[j1]>x[j]; j)
>>>> swap(x, j, j1);
>>>> return;
>>>> }
>>>> :
>>>> :
>>>> : code continues for the else part
>>>>
>>>> Also the grepcode url
>>>> <
>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6b14/java/util/Arrays.java#Arrays.sort1%28double[]%2Cint%2Cint%29
>>>> indicates the same
>>> OK, you were refering to a specific implementation. I was thinking in
>>> the general case.
>>>
>>>> (and in any case the doc explicitly states the algorithms
>>>>> explained are only implementation notes and are not part of the
>>>>> specification).
>>>>>
>>>> Yes its a part of comments anyways.
>>>>
>>>>> However, the most important part for now is the fact that we control
it
>>>>> and may be able to implement different NaN strategies. What we have
>>>>> currently fails.
>>>>>
>>>>> I agree on this and hence here is my take:
>>>> Leave FIXED asis and use the earlier insertionSort code (just change
>> the
>>>> name to sort rather than hardcoding it as insertionsort) to handle the
>> case
>>>> you were mentioning
>>> If we leave FIXED it as it is done know, even with insertionSort we do
>>> not really control what happens. It is deterministic as the pivoting and
>>> if/else structure is well defined (both in the selection part and in the
>>> final sort for subarrays), but it is untrackable so we can't document
>> it.
>>>> Continue to use MAXIMAL/MINIMAL for +/Inf transformation and that way
>> we
>>>> have covered both Inf and nan cases.
>>> OK.
>>>
>>>> Use REMOVED as default for all Percentile Estimation Types. (mostly
>>>> influenced by R here perhaps)
>>> This would be fine with me. I have concerns with FIXED only, the other
>>> strategies all seem quite realistic.
>>>
>>> Does anybody else has an advice for FIXED? What are the use case?
>>>
>>> best regards,
>>> Luc
>>>
>>>> best regards,
>>>>> Luc
>>>>>
>>>>>>> In the select
>>>>>>> operation, we know we have small subarrays at the call point.
Going
>>>>>>> back to the former insertionSort would recover the good behavior
for
>>>>>>> small arrays, but would in fact not be sufficient to really
>> implement a
>>>>>>> NaNStrategy.FIXED. I guess it would be simple to make it behave
like
>>>>>>> NaNStrategy.MAXIMAL but I did not try yet.
>>>>>>>
>>>>>>> My point here is rather: can we really and should we really implement
>>>>>>> NaNStrategy.FIXED? Looking at how it is used elsewhere, it needs
to
>>>>>>> store the original position of the NaNs. It is quite cumbersome.
>>>>>>>
>>>>>>> I wonder what is the use case for NaNStrategy.FIXED at all.
>>>>>>>
>>>>>>> Going further and looking at the use cases for other NaNStrategy
>> values,
>>>>>>> the NaNs are replaced by +/ infinity before sorting, which is
OK for
>>>>>>> ranking as we only rely on the indices, but we use the values
>> themselves
>>>>>>> in Percentile. So sometimes, we end up with computing interpolations
>>>>>>> like 0.5 * (x[k] + x[k+1]) or similar. If x[k] is a finite number
and
>>>>>>> x[k+1] has been changed to +infinity, we get +infinity, instead
of
>> the
>>>>>>> NaN we should have retrieved without replacement. So here again,
I'm
>> not
>>>>>>> sure we are doing the right thing.
>>>>>>>
>>>>>>> What do you think?
>>>>>>>
>>>>>>> best regards,
>>>>>>> Luc
>>>>>>>
>>>>>>> 
>>>>>>> To unsubscribe, email: devunsubscribe@commons.apache.org
>>>>>>> For additional commands, email: devhelp@commons.apache.org
>>>>>>>
>>>>>> 
>>>>>> To unsubscribe, email: devunsubscribe@commons.apache.org
>>>>>> For additional commands, email: devhelp@commons.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>> 
>>>>> To unsubscribe, email: devunsubscribe@commons.apache.org
>>>>> For additional commands, email: devhelp@commons.apache.org
>>>>>
>>>>>
>>> 
>>> To unsubscribe, email: devunsubscribe@commons.apache.org
>>> For additional commands, email: devhelp@commons.apache.org
>>>
>>>
>>
>> 
>> To unsubscribe, email: devunsubscribe@commons.apache.org
>> For additional commands, email: devhelp@commons.apache.org
>>
>>

To unsubscribe, email: devunsubscribe@commons.apache.org
For additional commands, email: devhelp@commons.apache.org
