spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chen Jin <karen...@gmail.com>
Subject Re: how to set SPARK_WORKER_INSTANCES and SPARK_WORKER_CORES otpimally
Date Mon, 27 Jan 2014 03:27:05 GMT
Hi Ankit,

Thanks for detailed explanation. Since my cluster has 5 machines each
of which has 8 cores and 48g memory, I was meant to say for the entire
cluster:

(a) gives us 40 workers with each core per worker (b) gives 5 workers
while each worker has eight cores.

A follow-up question, since each machine has 48g memory,

(a)
   SPARK_WORKER_INSTANCES = 8
   SPARK_WORKER_CORES = 1
   SPARK_WORKER_MEMORY = 6g

(b)
   SPARK_WORKER_INSTANCES = 1
   SPARK_WORKER_CORES = 8
   SPARK_WORKER_MEMORY = 48g

Will (a) setting help consume large dataset, while as you said each
machine has 8 JVMs now?

Thanks a lot,

-chen

On Sun, Jan 26, 2014 at 1:53 AM, Archit Thakur
<archit279thakur@gmail.com> wrote:
> Chen, The first one will launch 8 single threaded JVM's and the 2nd one will
> launch 1 8-threaded JVM.
> Performance depends on your data: If your data size is too small to be
> processed, 2nd one is better because of the launching time of 8 JVM's in
> first case. Also, if you have broadcasted anything, it'll have to that for 8
> machines.
> However, if you have quite big data to be processed, 1st one is better
> because i. In this case you can ignore the launching time of JVM. and ii.
> You'll now have 8 times memory available for processing.
> Assumption made: All machines are equipped with same memory/computing power.
>
>
> """(a) gives us 40 workers with each core per worker (b) gives 8 workers
> while each worker has eight cores. Any advice on which better would
> lead to better performance?"""
>
> No, (a) gives u 8 workers with each core per worker (b) gives 1 worker
>
> while each worker has eight cores.
>
> Let me know, if any doubts.
>
> Thanks and Regards,
> Archit Thakur.
>
>
>
> On Sun, Jan 26, 2014 at 5:58 AM, Chen Jin <karen.cj@gmail.com> wrote:
>>
>> Hi all,
>>
>> From spark document, we can set the number of workers by
>> SPARK_WORKER_INSTANCES and the max number of cores that worker can
>> take by using SPARK_WORKER_CORES, if I have 5 8-core machine, which
>> one would perform better between
>> (a)
>>    SPARK_WORKER_INSTANCES = 8
>>    SPARK_WORKER_CORES = 1
>>
>> and
>> (b)
>>    SPARK_WORKER_INSTANCES = 1
>>    SPARK_WORKER_CORES = 8
>>
>> (a) gives us 40 workers with each core per worker (b) gives 8 workers
>> while each worker has eight cores. Any advice on which better would
>> lead to better performance?
>>
>> Thanks a lot,
>>
>> -chen
>
>

Mime
View raw message