spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Or <and...@databricks.com>
Subject Re: spark-submit on YARN is slow
Date Fri, 05 Dec 2014 19:15:37 GMT
Hey Tobias,

As you suspect, the reason why it's slow is because the resource manager in
YARN takes a while to grant resources. This is because YARN needs to first
set up the application master container, and then this AM needs to request
more containers for Spark executors. I think this accounts for most of the
overhead. The remaining source probably comes from how our own YARN
integration code polls application (every second) and cluster resource
states (every 5 seconds IIRC). I haven't explored in detail whether there
are optimizations there that can speed this up, but I believe most of the
overhead comes from YARN itself.

In other words, no I don't know of any quick fix on your end that you can
do to speed this up.

-Andrew


2014-12-03 20:10 GMT-08:00 Tobias Pfeiffer <tgp@preferred.jp>:

> Hi,
>
> I am using spark-submit to submit my application to YARN in "yarn-cluster"
> mode. I have both the Spark assembly jar file as well as my application jar
> file put in HDFS and can see from the logging output that both files are
> used from there. However, it still takes about 10 seconds for my
> application's yarnAppState to switch from ACCEPTED to RUNNING.
>
> I am aware that this is probably not a Spark issue, but some YARN
> configuration setting (or YARN-inherent slowness), I was just wondering if
> anyone has an advice for how to speed this up.
>
> Thanks
> Tobias
>

Mime
View raw message