spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandy Ryza <sandy.r...@cloudera.com>
Subject Re: Spark on Yarn vs Standalone
Date Tue, 08 Sep 2015 22:02:07 GMT
Those settings seem reasonable to me.

Are you observing performance that's worse than you would expect?

-Sandy

On Mon, Sep 7, 2015 at 11:22 AM, Alexander Pivovarov <apivovarov@gmail.com>
wrote:

> Hi Sandy
>
> Thank you for your reply
> Currently we use r3.2xlarge boxes (vCPU: 8, Mem: 61 GiB)
> with emr setting for Spark "maximizeResourceAllocation": "true"
>
> It is automatically converted to Spark settings
> spark.executor.memory            47924M
> spark.yarn.executor.memoryOverhead 5324
>
> we also set spark.default.parallelism = slave_count * 16
>
> Does it look good for you? (we run single heavy job on cluster)
>
> Alex
>
> On Mon, Sep 7, 2015 at 11:03 AM, Sandy Ryza <sandy.ryza@cloudera.com>
> wrote:
>
>> Hi Alex,
>>
>> If they're both configured correctly, there's no reason that Spark
>> Standalone should provide performance or memory improvement over Spark on
>> YARN.
>>
>> -Sandy
>>
>> On Fri, Sep 4, 2015 at 1:24 PM, Alexander Pivovarov <apivovarov@gmail.com
>> > wrote:
>>
>>> Hi Everyone
>>>
>>> We are trying the latest aws emr-4.0.0 and Spark and my question is
>>> about YARN vs Standalone mode.
>>> Our usecase is
>>> - start 100-150 nodes cluster every week,
>>> - run one heavy spark job (5-6 hours)
>>> - save data to s3
>>> - stop cluster
>>>
>>> Officially aws emr-4.0.0 comes with Spark on Yarn
>>> It's probably possible to hack emr by creating bootstrap script which
>>> stops yarn and starts master and slaves on each computer  (to start Spark
>>> in standalone mode)
>>>
>>> My questions are
>>> - Does Spark standalone provides significant performance / memory
>>> improvement in comparison to YARN mode?
>>> - Does it worth hacking official emr Spark on Yarn and switch Spark to
>>> Standalone mode?
>>>
>>>
>>> I already created comparison table and want you to check if my
>>> understanding is correct
>>>
>>> Lets say r3.2xlarge computer has 52GB ram available for Spark Executor
>>> JVMs
>>>
>>>                     standalone to yarn comparison
>>>
>>>
>>>             STDLN   YARN
>>>
>>> can executor allocate up to 52GB ram                           - yes  |
>>>  yes
>>>
>>> will executor be unresponsive after using all 52GB ram because of GC -
>>> yes  |  yes
>>>
>>> additional JVMs on slave except of spark executor        - workr | node
>>> mngr
>>>
>>> are additional JVMs lightweight                                     -
>>> yes  |  yes
>>>
>>>
>>> Thank you
>>>
>>> Alex
>>>
>>
>>
>

Mime
View raw message