spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sameer Farooqui <same...@databricks.com>
Subject Re: spark-submit on YARN is slow
Date Fri, 05 Dec 2014 20:15:37 GMT
Just an FYI - I can submit the SparkPi app to YARN in cluster mode on a
1-node m3.xlarge EC2 instance instance and the app finishes running
successfully in about 40 seconds. I just figured the 30 - 40 sec run time
was normal b/c of the submitting overhead that Andrew mentioned.

Denny, you can maybe also try to run SparkPi against YARN as a speed check.

spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode
cluster --master yarn
/opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/jars/spark-examples-1.1.0-cdh5.2.1-hadoop2.5.0-cdh5.2.1.jar
10

On Fri, Dec 5, 2014 at 2:32 PM, Denny Lee <denny.g.lee@gmail.com> wrote:

> My submissions of Spark on YARN (CDH 5.2) resulted in a few thousand
> steps. If I was running this on standalone cluster mode the query finished
> in 55s but on YARN, the query was still running 30min later. Would the hard
> coded sleeps potentially be in play here?
> On Fri, Dec 5, 2014 at 11:23 Sandy Ryza <sandy.ryza@cloudera.com> wrote:
>
>> Hi Tobias,
>>
>> What version are you using?  In some recent versions, we had a couple of
>> large hardcoded sleeps on the Spark side.
>>
>> -Sandy
>>
>> On Fri, Dec 5, 2014 at 11:15 AM, Andrew Or <andrew@databricks.com> wrote:
>>
>>> Hey Tobias,
>>>
>>> As you suspect, the reason why it's slow is because the resource manager
>>> in YARN takes a while to grant resources. This is because YARN needs to
>>> first set up the application master container, and then this AM needs to
>>> request more containers for Spark executors. I think this accounts for most
>>> of the overhead. The remaining source probably comes from how our own YARN
>>> integration code polls application (every second) and cluster resource
>>> states (every 5 seconds IIRC). I haven't explored in detail whether there
>>> are optimizations there that can speed this up, but I believe most of the
>>> overhead comes from YARN itself.
>>>
>>> In other words, no I don't know of any quick fix on your end that you
>>> can do to speed this up.
>>>
>>> -Andrew
>>>
>>>
>>> 2014-12-03 20:10 GMT-08:00 Tobias Pfeiffer <tgp@preferred.jp>:
>>>
>>> Hi,
>>>>
>>>> I am using spark-submit to submit my application to YARN in
>>>> "yarn-cluster" mode. I have both the Spark assembly jar file as well as my
>>>> application jar file put in HDFS and can see from the logging output that
>>>> both files are used from there. However, it still takes about 10 seconds
>>>> for my application's yarnAppState to switch from ACCEPTED to RUNNING.
>>>>
>>>> I am aware that this is probably not a Spark issue, but some YARN
>>>> configuration setting (or YARN-inherent slowness), I was just wondering if
>>>> anyone has an advice for how to speed this up.
>>>>
>>>> Thanks
>>>> Tobias
>>>>
>>>
>>>
>>

Mime
View raw message