spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Raise Jenkins test timeout? with alternatives
Date Thu, 11 Apr 2019 18:15:40 GMT
If the machines are bottlenecked on I/O or are swapping, doing less work
concurrently would improve throughput, and parallelizing wouldn't. I don't
know that it's the case, but am wondering out loud as the runtimes seem to
vary by 20-30% sometimes. Naturally, having the option to parallelize is
good as well, if those bottlenecks aren't actually a problem or are
resolved otherwise.

On Thu, Apr 11, 2019 at 1:10 PM Xin Lu <xlu@salesforce.com> wrote:

> I worked on parallelizing the tests two years ago.  It does require an
> update to the amplab jenkins, which is very old (1.651.3 released
> 2016-7-1).  The current  version of cloudbees jenkins has stages and it is
> not difficult to put tests in parallel stages and aggregate the test
> results.  Reducing concurrent builds per machine would not resolve just the
> sheer length of tests running serially and the number of PRs.
>
> Xin
>
> On Thu, Apr 11, 2019 at 10:53 AM Sean Owen <srowen@gmail.com> wrote:
>
>> Agree, and I can make a few of the ML regression tests faster pretty
>> easily. Here the issue is more about what happens when you run every single
>> test, and man that does take a long time. Maybe rare enough to not justify
>> upping the build timeout. (The PR passed just barely this time anyway)
>>
>> Q for Shane: we have a ton of build slots, but it seems like worker
>> performance does slow down when there are multiple builds in progress. Is
>> there any value in reducing the number of concurrent builds per machine,
>> esp if we're not really using all of it? might help load balance more or
>> something. I was also trying to figure out if they were swapping or
>> something but couldn't find an easy way to tell.
>>
>> On Thu, Apr 11, 2019 at 11:00 AM Xiao Li <lixiao@databricks.com> wrote:
>>
>>> Hi, Sean
>>>
>>> Your issue actually shows our existing test frameworks needs a change
>>> ASAP. We need to go over the tests listed in
>>> https://spark-tests.appspot.com/slow-tests and see whether we can
>>> reduce the time or run these test suites in parallel.
>>>
>>> Thanks,
>>>
>>> Xiao
>>>
>>>
>>>
>>> On Thu, Apr 11, 2019 at 4:26 AM Sean Owen <srowen@gmail.com> wrote:
>>>
>>>> I have a big PR that keeps failing because it his the 300 minute build
>>>> timeout:
>>>>
>>>> https://github.com/apache/spark/pull/24314
>>>>
>>>> https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4703/console
>>>>
>>>> It's because it touches so much code that all tests run including
>>>> things like Kinesis. It looks like 300 mins isn't enough. We can raise
>>>> it to an eye-watering 360 minutes if that's just how long all tests
>>>> take.
>>>>
>>>> I can also try splitting up the change to move out changes to a few
>>>> optional modules into separate PRs.
>>>>
>>>> (Because this one makes it all the way through Python and Java tests
>>>> and almost all R tests several times, and doesn't touch Python or R
>>>> and shouldn't have any functional changes, I'm tempted to just merge
>>>> it, too, as a solution)
>>>>
>>>> Thoughts?
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>
>>>>
>>>
>>> --
>>> [image:
>>> https://databricks.com/sparkaisummit/north-america?utm_source=email&utm_medium=signature]
>>>
>>

Mime
View raw message