spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <msegel_had...@hotmail.com>
Subject Re: HW imbalance
Date Fri, 30 Jan 2015 07:03:15 GMT
@Sandy, 

There are two issues. 
The spark context (executor) and then the cluster under YARN. 

If you have a box where each yarn job needs 3GB,  and your machine has 36GB dedicated as a
YARN resource, you can run 12 executors on the single node. 
If you have a box that has 72GB dedicated to YARN, you can run up to 24 contexts (executors)
in parallel. 

Assuming that you’re not running any other jobs. 

The larger issue is if your version of Hadoop will easily let you run with multiple profiles
or not. Ambari (1.6 and early does not.) Its supposed to be fixed in 1.7 but I haven’t evaluated
it yet. 
Cloudera? YMMV

If I understood the question raised by the OP, its more about a heterogeneous cluster than
spark.

-Mike

On Jan 26, 2015, at 5:02 PM, Sandy Ryza <sandy.ryza@cloudera.com> wrote:

> Hi Antony,
> 
> Unfortunately, all executors for any single Spark application must have the same amount
of memory.  It's possibly to configure YARN with different amounts of memory for each host
(using yarn.nodemanager.resource.memory-mb), so other apps might be able to take advantage
of the extra memory.
> 
> -Sandy
> 
> On Mon, Jan 26, 2015 at 8:34 AM, Michael Segel <msegel_hadoop@hotmail.com> wrote:
> If you’re running YARN, then you should be able to mix and max where YARN is managing
the resources available on the node. 
> 
> Having said that… it depends on which version of Hadoop/YARN. 
> 
> If you’re running Hortonworks and Ambari, then setting up multiple profiles may not
be straight forward. (I haven’t seen the latest version of Ambari) 
> 
> So in theory, one profile would be for your smaller 36GB of ram, then one profile for
your 128GB sized machines. 
> Then as your request resources for your spark job, it should schedule the jobs based
on the cluster’s available resources. 
> (At least in theory.  I haven’t tried this so YMMV) 
> 
> HTH
> 
> -Mike
> 
> On Jan 26, 2015, at 4:25 PM, Antony Mayi <antonymayi@yahoo.com.INVALID> wrote:
> 
>> should have said I am running as yarn-client. all I can see is specifying the generic
executor memory that is then to be used in all containers.
>> 
>> 
>> On Monday, 26 January 2015, 16:48, Charles Feduke <charles.feduke@gmail.com>
wrote:
>> 
>> 
>> You should look at using Mesos. This should abstract away the individual hosts into
a pool of resources and make the different physical specifications manageable.
>> 
>> I haven't tried configuring Spark Standalone mode to have different specs on different
machines but based on spark-env.sh.template:
>> 
>> # - SPARK_WORKER_CORES, to set the number of cores to use on this machine
>> # - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors
(e.g. 1000m, 2g)
>> # - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y")
>> it looks like you should be able to mix. (Its not clear to me whether SPARK_WORKER_MEMORY
is uniform across the cluster or for the machine where the config file resides.)
>> 
>> On Mon Jan 26 2015 at 8:07:51 AM Antony Mayi <antonymayi@yahoo.com.invalid>
wrote:
>> Hi,
>> 
>> is it possible to mix hosts with (significantly) different specs within a cluster
(without wasting the extra resources)? for example having 10 nodes with 36GB RAM/10CPUs now
trying to add 3 hosts with 128GB/10CPUs - is there a way to utilize the extra memory by spark
executors (as my understanding is all spark executors must have same memory).
>> 
>> thanks,
>> Antony.
>> 
>> 
> 
> 


Mime
View raw message