spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tianhua huang <huangtianhua...@gmail.com>
Subject Re: Ask for ARM CI for spark
Date Fri, 26 Jul 2019 07:32:38 GMT
Hi, all


Sorry to disturb again, there are several sql tests failed on arm64
instance:

   - pgSQL/float8.sql *** FAILED ***
   Expected "0.549306144334054[9]", but got "0.549306144334054[8]" Result
   did not match for query #56
   SELECT atanh(double('0.5')) (SQLQueryTestSuite.scala:362)
   - pgSQL/numeric.sql *** FAILED ***
   Expected "2 2247902679199174[72 224790267919917955.1326161858
   4 7405685069595001 7405685069594999.0773399947
   5 5068226527.321263 5068226527.3212726541
   6 281839893606.99365 281839893606.9937234336
   7 1716699575118595840 1716699575118597095.4233081991
   8 167361463828.0749 167361463828.0749132007
   9 107511333880051856] 107511333880052007....", but got "2
   2247902679199174[40224790267919917955.1326161858
   4 7405685069595001 7405685069594999.0773399947
   5 5068226527.321263 5068226527.3212726541
   6 281839893606.99365 281839893606.9937234336
   7 1716699575118595580 1716699575118597095.4233081991
   8 167361463828.0749 167361463828.0749132007
   9 107511333880051872] 107511333880052007...." Result did not match for
   query #496
   SELECT t1.id1, t1.result, t2.expected
   FROM num_result t1, num_exp_power_10_ln t2
   WHERE t1.id1 = t2.id
   AND t1.result != t2.expected (SQLQueryTestSuite.scala:362)

The first test failed, because the value of math.log(3.0) is different on
aarch64:

# on x86_64:
scala> val a = 0.5
a: Double = 0.5

scala> a * math.log((1.0 + a) / (1.0 - a))
res1: Double = 0.5493061443340549

scala> math.log((1.0 + a) / (1.0 - a))
res2: Double = 1.0986122886681098

# on aarch64:

scala> val a = 0.5

a: Double = 0.5

scala> a * math.log((1.0 + a) / (1.0 - a))
res20: Double = 0.5493061443340548

scala> math.log((1.0 + a) / (1.0 - a))

res21: Double = 1.0986122886681096

And I tried other several numbers like math.log(4.0) and math.log(5.0) and
they are same, I don't know why math.log(3.0) is so special? But the result
is different indeed on aarch64. If you are interesting, please try it.

The second test failed, because some values of pow(10, x) is different on
aarch64, according to sql tests of spark, I took similar tests on aarch64
and x86_64, take '-83028485' as example:

# on x86_64:
scala> import java.lang.Math._
import java.lang.Math._
scala> var a = -83028485
a: Int = -83028485
scala> abs(a)
res4: Int = 83028485
scala> math.log(abs(a))
res5: Double = 18.234694299654787
scala> pow(10, math.log(abs(a)))
res6: Double = 1.71669957511859584E18

# on aarch64:

scala> var a = -83028485
a: Int = -83028485
scala> abs(a)
res38: Int = 83028485

scala> math.log(abs(a))

res39: Double = 18.234694299654787
scala> pow(10, math.log(abs(a)))
res40: Double = 1.71669957511859558E18

I send an email to jdk-dev, hope someone can help, and also I proposed this
to JIRA  https://issues.apache.org/jira/browse/SPARK-28519, , if you are
interesting, welcome to join and discuss, thank you very much.

On Thu, Jul 18, 2019 at 11:12 AM Tianhua huang <huangtianhua223@gmail.com>
wrote:

> Thanks for your reply.
>
> About the first problem we didn't find any other reason in log, just found
> timeout to wait the executor up, and after increase the timeout from 10000
> ms to 30000(even 20000)ms,
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L764
>
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L792
> the test passed, and there are more than one executor up, not sure whether
> it's related with the flavor of our aarch64 instance? Now the flavor of the
> instance is 8C8G. Maybe we will try the bigger flavor later. Or any one has
> other suggestion, please contact me, thank you.
>
> About the second problem, I proposed a pull request to apache/spark,
> https://github.com/apache/spark/pull/25186  if you have time, would you
> please to help to review it, thank you very much.
>
> On Wed, Jul 17, 2019 at 8:37 PM Sean Owen <srowen@gmail.com> wrote:
>
>> On Wed, Jul 17, 2019 at 6:28 AM Tianhua huang <huangtianhua223@gmail.com>
>> wrote:
>> > Two failed and the reason is 'Can't find 1 executors before 10000
>> milliseconds elapsed', see below, then we try increase timeout the tests
>> passed, so wonder if we can increase the timeout? and here I have another
>> question about
>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/TestUtils.scala#L285,
>> why is not >=? see the comment of the function, it should be >=?
>> >
>>
>> I think it's ">" because the driver is also an executor, but not 100%
>> sure. In any event it passes in general.
>> These errors typically mean "I didn't start successfully" for some
>> other reason that may be in the logs.
>>
>> > The other two failed and the reason is '2143289344 equaled 2143289344',
>> this because the value of floatToRawIntBits(0.0f/0.0f) on aarch64 platform
>> is 2143289344 and equals to floatToRawIntBits(Float.NaN). About this I send
>> email to jdk-dev and proposed a topic on scala community
>> https://users.scala-lang.org/t/the-value-of-floattorawintbits-0-0f-0-0f-is-different-on-x86-64-and-aarch64-platforms/4845
>> and https://github.com/scala/bug/issues/11632, I thought it's something
>> about jdk or scala, but after discuss, it should related with platform, so
>> seems the following asserts is not appropriate?
>> https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala#L704-L705
>> and
>> https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala#L732-L733
>>
>> These tests could special-case execution on ARM, like you'll see some
>> tests handle big-endian architectures.
>>
>

Mime
View raw message