From "Venkatesha Murthy TS (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MATH-1120) Need Percentile computations that can be matched with standard spreadsheet formula
Date Sun, 01 Jun 2014 20:22:01 GMT
https://issues.apache.org/jira/browse/MATH-1120
Venkatesha Murthy TS updated MATH-1120:
Attachment: percentile-with-estimation-patch

As per earlier discussion ; was advised to take a look at the references for possible different
types of computation and come up with a draft.

Here is what i have been thinking

There are atleast 9-10 documented approaches (from http://en.wikipedia.org/wiki/Quantile )
ofcomputing the percentile and the R statistical tool also has a reference implementation
of these. All these strategies have provided formulaes for choice of the index of the array
and an estimation technique to compute the estimation.

These estimation tecniques can be turned in naturally as enum EstimationTechnique (R1, R2,
etc. where R1,R2 are estimation types as elucidated in wikipedia) with the below funtions
int index( double pthQuantile, int N);
double estimate(double[] values, int[] pivotsHeap, double pos, int length)

In addition the Percentile class already does amedian of 3 based pivoting for a kth selection.
Since pivoting is again a strategy; we could go for a pivoting strategy enum along with defaults
to median of 3. Further Kth Selection logic can now be sub sumed inside the EstimationTechnique
as estimate method.

Changes to Percentile:
Percentile has one or 2 more constructors to accommodate specifying EstimationTechnique during
concstruction. The default estimation technique being the existing Percentile computation
logic Which need not be specified and just the existing constructors willl work the same way
as it used to be.

Remove the Kth selection private methods and move them under KthSelector class (a separate
nested class). However medianOf3 is exposed as package level access and hence needs to be
refactored to use KthSelector class. It could also be deprecated as the method is not strictly
with percentile logic (as much as Kthselection)
Add 2 small methods to getWorkArray and Cached pivots that will need to be passed along to
estimation tecnhique.

I agree with removing/my earlier suggestion on ExcelPercentile{Test} and would like to look
foward with opinions on the new approach.

Please let know on the attached percentile-with-estimation-patch

Need Percentile computations that can be matched with standard spreadsheet formula
Key: MATH-1120
>                 URL: https://issues.apache.org/jira/browse/MATH-1120
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 3.2
>            Reporter: Venkatesha Murthy TS
>              Labels: Percentile
>             Fix For: 4.0
>         Attachments: excel-percentile-patch, percentile-with-estimation-patch
>   Original Estimate: 504h
>  Remaining Estimate: 504h
> The current Percentile implementation assumes and hard-codes the quantile pth position
as
> p * (N+1)/100 and provides a kth selected value.
> However if we need to verify compare/contrast with standard statistical tools such as
say MS Excel; it would be good to provide an extensible way of morphing this selection of
position than hard code.
> For example in order to generate the percentile closely matching with MS Excel the position
required may be [p*(N-1)/100]+1.
> I do have patch ready with small change needed in Percentile class and a new ExcelPercentile
class written with tests closely matching with that of PercentileTest class.
> Please let me know if i could submit this as a patch.

This message was sent by Atlassian JIRA
(v6.2#6252)

