 "Mark R. Diggory" <mdiggory@latte.harvard.edu> wrote:
> Thanks for entertaining my sometimes naive questioning,
>
> J.Pietschmann wrote:
>
> > Mark R. Diggory wrote:
> >
> >> (1) Does it seem logical that when working with "n" (or
> >> values.length) to use Math.pow(n, x), as positive integers, the risk
> >> is actually 'integer overflow' when the array representing the number
> >> of cases gets very large, for which the log implementation of
> >> Math.pow would help retain greater numerical accuracy?
> >
> >
> > No. If you cast the base into a double there is not much risk of
> > overflow: double x = n; y=x*x; or y=((double)n)*((double)n);
> > or even y=n*(double)n; (but avoid y=(double)n*n).
> > Double mantissa has IIRC 52 bits, this should be good for integers
> > up to 2^26=67108864 without loss of precision.
>
> Wow, thats a great clarification, I understand your defense now. It
> would be best to cast n to double asap and always have * operating on
> doubles, so doing things like consolidating the casting of n like this
>
> ((double) (n*(n  1)))
>
> is a poor choice, if I understand correctly, its wiser to do at least
>
> ((double) n) * (n  1)
For avoiding overflow maybe, but not for precision, since if n is a long,
n*(n1) will be computed exactly in the first case and then cast to a double.
> >
> > If you are dealing with floating point numbers, your concern is
> > loss of precision, not overflow. Apart from this, I don't understand
> > in what sense the log based Math.pow(values[i], 2.0) should be
> > favorable. If ther's precision loss for x*x, there will be at least
> > the same presicion loss for Math.pow(values[i], 2.0), because at least
> > the same number of bits will be missing from the mantissa.
>
> Again, its starting be obvious that I had some bad assumptions about
> floating point arith. and any benefits from "e*log(2)+log(m)"
> calculations in Math.pow. Your discussion has convinced me that its
> usage isn't that great a benefit in terms of any numerical stability in
> terms of working with (doubles).
>
> But, now, I'm a little more confused, on a different subject I(we) took
> on this exp*log strategy for calculating the geometric mean as:
>
> /**
> * Returns the sum of the natural logs for this collection of values
> * @param values Is a double[] containing the values
> * @return the sumLog value or Double.NaN if the array is empty
> */
> public static double sumLog(double[] values) {
> double sumLog = Double.NaN;
> if (values.length > 0) {
> sumLog = 0.0;
> for (int i = 0; i < values.length; i++) {
> sumLog += Math.log(values[i]);
> }
> }
> return sumLog;
> }
>
> /**
> * Returns the geometric mean for this collection of values
> * @param values Is a double[] containing the values
> * @return the geometric mean or Double.NaN if the array is empty or
> * any of the values are <= 0.
> */
> public static double geometricMean(double[] values) {
> return Math.exp(sumLog(values) / (double) values.length);
> }
>
> I'm not sure, but isn't this applying a similar exp*log approach found
> in math.pow. How would you interpret this solution over the what we had
> before? We approached the above because of concerns that (in the below
> example) as the product in "product *= values[i]" gets very large, that
> Math.pow(product(values),(1.0/values.length)); would then introduce a
> loss of precision as "1.0/values.length" gets smaller and smaller. But,
> doesn't the above algorithm introduce a loss of precision when
> values[i] are very small values as well (similar to what you are
> describing in Math.pow(...))?
>
> /**
> * Returns the product for this collection of values
> * @param values Is a double[] containing the values
> * @return the product values or Double.NaN if the array is empty
> */
> public static double product(double[] values) {
> double product = Double.NaN;
> if( values.length > 0 ) {
> product = 1.0;
> for( int i = 0; i < values.length; i++) {
> product *= values[i];
> }
> }
> return product;
> }
>
> /**
> * Returns the geometric mean for this collection of values
> * @param values Is a double[] containing the values
> * @return the geometric mean or Double.NaN if the array is empty or
> * any of the values are <= 0.
> */
> public static double geometricMean(double[] values) {
> return Math.pow(product(values),(1.0/values.length));
> }
>
I would like to hear J's opinion on this, but from my perspective, the
difference is in the bound on the number of terms in the product. In the case
of the geometric mean, there is no bound and there is a real possiblity of
premature overflow. Also, there is exponentiation applied in the computation
in any case to compute the statistic and this computation "might" be more
stable working with logs (need definitive confirmation of this). In any case,
I have seen this computational strategy applied elsewhere for geometric means.
Here again, some research should be done, or we should drop the statistic.
I am also a little confused by J's analysis. Do we know definitively that using
J2SE, there is precision loss in Math.pow(x,n) vs x*x*...*x (n terms) for small
integer n? If the answer is yes, we should establish the guideline that for
integer n < 4, we use explicit products instead of Math.pow(,n).
Phil
p.s. Let's all try to remember the [math] in the subject lines.
>
> Mark
>
>
>
> 
> To unsubscribe, email: commonsdevunsubscribe@jakarta.apache.org
> For additional commands, email: commonsdevhelp@jakarta.apache.org
>
__________________________________
Do you Yahoo!?
SBC Yahoo! DSL  Now only $29.95 per month!
http://sbc.yahoo.com

To unsubscribe, email: commonsdevunsubscribe@jakarta.apache.org
For additional commands, email: commonsdevhelp@jakarta.apache.org
