spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bo zhaobo <bzhaojyathousa...@gmail.com>
Subject Re: Ask for ARM CI for spark
Date Sat, 27 Jul 2019 01:34:04 GMT
Hi all,

Thanks for your concern. Yeah, that's worth to also test in backend
database. But need to note here, this issue is hit in Spark SQL, as we only
test it with spark itself, not integrate other databases.

Best Regards,

ZhaoBo



[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
19/07/27
上午09:30:56

Sean Owen <srowen@gmail.com> 于2019年7月26日周五 下午5:46写道:

> Interesting. I don't think log(3) is special, it's just that some
> differences in how it's implemented and floating-point values on
> aarch64 vs x86, or in the JVM, manifest at some values like this. It's
> still a little surprising! BTW Wolfram Alpha suggests that the correct
> value is more like ...810969..., right between the two. java.lang.Math
> doesn't guarantee strict IEEE floating-point behavior, but
> java.lang.StrictMath is supposed to, at the potential cost of speed,
> and it gives ...81096, in agreement with aarch64.
>
> @Yuming Wang the results in float8.sql are from PostgreSQL directly?
> Interesting if it also returns the same less accurate result, which
> might suggest it's more to do with underlying OS math libraries. You
> noted that these tests sometimes gave platform-dependent differences
> in the last digit, so wondering if the test value directly reflects
> PostgreSQL or just what we happen to return now.
>
> One option is to use StrictMath in special cases like computing atanh.
> That gives a value that agrees with aarch64.
> I also note that 0.5 * (math.log(1 + x) - math.log(1 - x) gives the
> more accurate answer too, and makes the result agree with, say,
> Wolfram Alpha for atanh(0.5).
> (Actually if we do that, better still is 0.5 * (math.log1p(x) -
> math.log1p(-x)) for best accuracy near 0)
> Commons Math also has implementations of sinh, cosh, atanh that we
> could call. It claims it's possibly more accurate and faster. I
> haven't tested its result here.
>
> FWIW the "log1p" version appears, from some informal testing, to be
> most accurate (in agreement with Wolfram) and using StrictMath doesn't
> matter. If we change something, I'd use that version above.
> The only issue is if this causes the result to disagree with
> PostgreSQL, but then again it's more correct and maybe the DB is
> wrong.
>
>
> The rest may be a test vs PostgreSQL issue; see
> https://issues.apache.org/jira/browse/SPARK-28316
>
>
> On Fri, Jul 26, 2019 at 2:32 AM Tianhua huang <huangtianhua223@gmail.com>
> wrote:
> >
> > Hi, all
> >
> >
> > Sorry to disturb again, there are several sql tests failed on arm64
> instance:
> >
> > pgSQL/float8.sql *** FAILED ***
> > Expected "0.549306144334054[9]", but got "0.549306144334054[8]" Result
> did not match for query #56
> > SELECT atanh(double('0.5')) (SQLQueryTestSuite.scala:362)
> > pgSQL/numeric.sql *** FAILED ***
> > Expected "2 2247902679199174[72 224790267919917955.1326161858
> > 4 7405685069595001 7405685069594999.0773399947
> > 5 5068226527.321263 5068226527.3212726541
> > 6 281839893606.99365 281839893606.9937234336
> > 7 1716699575118595840 1716699575118597095.4233081991
> > 8 167361463828.0749 167361463828.0749132007
> > 9 107511333880051856] 107511333880052007....", but got "2
> 2247902679199174[40224790267919917955.1326161858
> > 4 7405685069595001 7405685069594999.0773399947
> > 5 5068226527.321263 5068226527.3212726541
> > 6 281839893606.99365 281839893606.9937234336
> > 7 1716699575118595580 1716699575118597095.4233081991
> > 8 167361463828.0749 167361463828.0749132007
> > 9 107511333880051872] 107511333880052007...." Result did not match for
> query #496
> > SELECT t1.id1, t1.result, t2.expected
> > FROM num_result t1, num_exp_power_10_ln t2
> > WHERE t1.id1 = t2.id
> > AND t1.result != t2.expected (SQLQueryTestSuite.scala:362)
> >
> > The first test failed, because the value of math.log(3.0) is different
> on aarch64:
> >
> > # on x86_64:
> >
> > scala> val a = 0.5
> > a: Double = 0.5
> >
> > scala> a * math.log((1.0 + a) / (1.0 - a))
> > res1: Double = 0.5493061443340549
> >
> > scala> math.log((1.0 + a) / (1.0 - a))
> > res2: Double = 1.0986122886681098
> >
> > # on aarch64:
> >
> > scala> val a = 0.5
> >
> > a: Double = 0.5
> >
> > scala> a * math.log((1.0 + a) / (1.0 - a))
> >
> > res20: Double = 0.5493061443340548
> >
> > scala> math.log((1.0 + a) / (1.0 - a))
> >
> > res21: Double = 1.0986122886681096
> >
> > And I tried other several numbers like math.log(4.0) and math.log(5.0)
> and they are same, I don't know why math.log(3.0) is so special? But the
> result is different indeed on aarch64. If you are interesting, please try
> it.
> >
> > The second test failed, because some values of pow(10, x) is different
> on aarch64, according to sql tests of spark, I took similar tests on
> aarch64 and x86_64, take '-83028485' as example:
> >
> > # on x86_64:
> > scala> import java.lang.Math._
> > import java.lang.Math._
> > scala> var a = -83028485
> > a: Int = -83028485
> > scala> abs(a)
> > res4: Int = 83028485
> > scala> math.log(abs(a))
> > res5: Double = 18.234694299654787
> > scala> pow(10, math.log(abs(a)))
> > res6: Double = 1.71669957511859584E18
> >
> > # on aarch64:
> >
> > scala> var a = -83028485
> > a: Int = -83028485
> > scala> abs(a)
> > res38: Int = 83028485
> >
> > scala> math.log(abs(a))
> >
> > res39: Double = 18.234694299654787
> > scala> pow(10, math.log(abs(a)))
> > res40: Double = 1.71669957511859558E18
> >
> > I send an email to jdk-dev, hope someone can help, and also I proposed
> this to JIRA  https://issues.apache.org/jira/browse/SPARK-28519, , if you
> are interesting, welcome to join and discuss, thank you very much.
> >
> >
> > On Thu, Jul 18, 2019 at 11:12 AM Tianhua huang <
> huangtianhua223@gmail.com> wrote:
> >>
> >> Thanks for your reply.
> >>
> >> About the first problem we didn't find any other reason in log, just
> found timeout to wait the executor up, and after increase the timeout from
> 10000 ms to 30000(even 20000)ms,
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L764
>
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L792
> the test passed, and there are more than one executor up, not sure whether
> it's related with the flavor of our aarch64 instance? Now the flavor of the
> instance is 8C8G. Maybe we will try the bigger flavor later. Or any one has
> other suggestion, please contact me, thank you.
> >>
> >> About the second problem, I proposed a pull request to apache/spark,
> https://github.com/apache/spark/pull/25186  if you have time, would you
> please to help to review it, thank you very much.
> >>
> >> On Wed, Jul 17, 2019 at 8:37 PM Sean Owen <srowen@gmail.com> wrote:
> >>>
> >>> On Wed, Jul 17, 2019 at 6:28 AM Tianhua huang <
> huangtianhua223@gmail.com> wrote:
> >>> > Two failed and the reason is 'Can't find 1 executors before 10000
> milliseconds elapsed', see below, then we try increase timeout the tests
> passed, so wonder if we can increase the timeout? and here I have another
> question about
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/TestUtils.scala#L285,
> why is not >=? see the comment of the function, it should be >=?
> >>> >
> >>>
> >>> I think it's ">" because the driver is also an executor, but not 100%
> >>> sure. In any event it passes in general.
> >>> These errors typically mean "I didn't start successfully" for some
> >>> other reason that may be in the logs.
> >>>
> >>> > The other two failed and the reason is '2143289344 equaled
> 2143289344', this because the value of floatToRawIntBits(0.0f/0.0f) on
> aarch64 platform is 2143289344 and equals to floatToRawIntBits(Float.NaN).
> About this I send email to jdk-dev and proposed a topic on scala community
> https://users.scala-lang.org/t/the-value-of-floattorawintbits-0-0f-0-0f-is-different-on-x86-64-and-aarch64-platforms/4845
> and https://github.com/scala/bug/issues/11632, I thought it's something
> about jdk or scala, but after discuss, it should related with platform, so
> seems the following asserts is not appropriate?
> https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala#L704-L705
> and
> https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala#L732-L733
> >>>
> >>> These tests could special-case execution on ARM, like you'll see some
> >>> tests handle big-endian architectures.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Mime
View raw message