spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bo zhaobo <bzhaojyathousa...@gmail.com>
Subject Re: Ask for ARM CI for spark
Date Fri, 16 Aug 2019 02:01:16 GMT
Hi Sean,

Thanks very much for pointing out the roadmap. ;-). Then I think we will
continue to focus on our test environment.

For the networking problems, I mean that we can access Maven Central, and
jobs cloud download the required jar package with a high network speed.
What we want to know is that, why the Spark QA test jobs[1] log shows the
job script/maven build seem don't download the jar packages? Could you tell
us the reason about that? Thank you.  The reason we raise the "networking
problems" is that we found a phenomenon during we test, if we execute "mvn
clean package" in a new test environment(As in our test environment, we
will destory the test VMs after the job is finish), maven will download the
dependency jar packages from Maven Central, but in this job
"spark-master-test-maven-hadoop" [2], from the log, we didn't found it
download any jar packages, what the reason about that?
Also we build the Spark jar with downloading dependencies from Maven
Central, it will cost mostly 1 hour. And we found [2] just cost 10min. But
if we run "mvn package" in a VM which already exec "mvn package" before, it
just cost 14min, looks very closer with [2]. So we suspect that downloading
the Jar packages cost so much time. For the goad of ARM CI, we expect the
performance of NEW ARM CI could be closer with existing X86 CI, then users
could accept it eaiser.

[1] https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/
[2]
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.6-ubuntu-testing/lastBuild/consoleFull

Best regards

ZhaoBo




[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&>
19/08/16
上午09:48:43

Sean Owen <srowen@gmail.com> 于2019年8月15日周四 下午9:58写道:

> I think the right goal is to fix the remaining issues first. If we set up
> CI/CD it will only tell us there are still some test failures. If it's
> stable, and not hard to add to the existing CI/CD, yes it could be done
> automatically later. You can continue to test on ARM independently for now.
>
> It sounds indeed like there are some networking problems in the test
> system if you're not able to download from Maven Central. That rarely takes
> significant time, and there aren't project-specific mirrors here. You might
> be able to point at a closer public mirror, depending on where you are.
>
> On Thu, Aug 15, 2019 at 5:43 AM Tianhua huang <huangtianhua223@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I want to discuss spark ARM CI again, we took some tests on arm instance
>> based on master and the job includes
>> https://github.com/theopenlab/spark/pull/13  and k8s integration
>> https://github.com/theopenlab/spark/pull/17/ , there are several things
>> I want to talk about:
>>
>> First, about the failed tests:
>>     1.we have fixed some problems like
>> https://github.com/apache/spark/pull/25186 and
>> https://github.com/apache/spark/pull/25279, thanks sean owen and others
>> to help us.
>>     2.we tried k8s integration test on arm, and met an error: apk fetch
>> hangs,  the tests passed  after adding '--network host' option for command
>> `docker build`, see:
>>
>> https://github.com/theopenlab/spark/pull/17/files#diff-5b731b14068240d63a93c393f6f9b1e8R176
>> , the solution refers to
>> https://github.com/gliderlabs/docker-alpine/issues/307  and I don't know
>> whether it happened once in community CI, or maybe we should submit a pr to
>> pass  '--network host' when `docker build`?
>>     3.we found there are two tests failed after the commit
>> https://github.com/apache/spark/pull/23767  :
>>        ReplayListenerSuite:
>>        - ...
>>        - End-to-end replay *** FAILED ***
>>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>>        - End-to-end replay with compression *** FAILED ***
>>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>>
>>         we tried to revert the commit and then the tests passed, the
>> patch is too big and so sorry we can't find the reason till now, if you are
>> interesting please try it, and it will be very appreciate          if
>> someone can help us to figure it out.
>>
>> Second, about the test time, we increased the flavor of arm instance to
>> 16U16G, but seems there was no significant improvement, the k8s integration
>> test took about one and a half hours, and the QA test(like
>> spark-master-test-maven-hadoop-2.7 community jenkins job) took about
>> seventeen hours(it is too long :(), we suspect that the reason is the
>> performance and network,
>> we split the jobs based on projects such as sql, core and so on, the time
>> can be decrease to about seven hours, see
>> https://github.com/theopenlab/spark/pull/19 We found the Spark QA tests
>> like  https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/   ,
>> it looks all tests seem never download the jar packages from maven centry
>> repo(such as
>> https://repo.maven.apache.org/maven2/org/opencypher/okapi-api/0.4.2/okapi-api-0.4.2.jar).
>> So we want to know how the jenkins jobs can do that, is there a internal
>> maven repo launched? maybe we can do the same thing to avoid the network
>> connection cost during downloading the dependent jar packages.
>>
>> Third, the most important thing, it's about ARM CI of spark, we believe
>> that it is necessary, right? And you can see we really made a lot of
>> efforts, now the basic arm build/test jobs is ok, so we suggest to add arm
>> jobs to community, we can set them to novoting firstly, and improve/rich
>> the jobs step by step. Generally, there are two ways in our mind to
>> integrate the ARM CI for spark:
>>      1) We introduce openlab ARM CI into spark as a custom CI system. We
>> provide human resources and test ARM VMs, also we will focus on the ARM
>> related issues about Spark. We will push the PR into community.
>>      2) We donate ARM VM resources into existing amplab Jenkins. We still
>> provide human resources, focus on the ARM related issues about Spark and
>> push the PR into community.
>> Both options, we will provide human resources to maintain, of course it
>> will be great if we can work together. So please tell us which option you
>> would like? And let's move forward. Waiting for your reply, thank you very
>> much.
>>
>

Mime
View raw message