spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandy Ryza <sandy.r...@cloudera.com>
Subject Re: spark-submit on YARN is slow
Date Fri, 05 Dec 2014 19:22:16 GMT
Hi Tobias,

What version are you using?  In some recent versions, we had a couple of
large hardcoded sleeps on the Spark side.

-Sandy

On Fri, Dec 5, 2014 at 11:15 AM, Andrew Or <andrew@databricks.com> wrote:

> Hey Tobias,
>
> As you suspect, the reason why it's slow is because the resource manager
> in YARN takes a while to grant resources. This is because YARN needs to
> first set up the application master container, and then this AM needs to
> request more containers for Spark executors. I think this accounts for most
> of the overhead. The remaining source probably comes from how our own YARN
> integration code polls application (every second) and cluster resource
> states (every 5 seconds IIRC). I haven't explored in detail whether there
> are optimizations there that can speed this up, but I believe most of the
> overhead comes from YARN itself.
>
> In other words, no I don't know of any quick fix on your end that you can
> do to speed this up.
>
> -Andrew
>
>
> 2014-12-03 20:10 GMT-08:00 Tobias Pfeiffer <tgp@preferred.jp>:
>
> Hi,
>>
>> I am using spark-submit to submit my application to YARN in
>> "yarn-cluster" mode. I have both the Spark assembly jar file as well as my
>> application jar file put in HDFS and can see from the logging output that
>> both files are used from there. However, it still takes about 10 seconds
>> for my application's yarnAppState to switch from ACCEPTED to RUNNING.
>>
>> I am aware that this is probably not a Spark issue, but some YARN
>> configuration setting (or YARN-inherent slowness), I was just wondering if
>> anyone has an advice for how to speed this up.
>>
>> Thanks
>> Tobias
>>
>
>

Mime
View raw message