spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Victor Tso-Guillen <v...@paxata.com>
Subject Re: heterogeneous cluster setup
Date Thu, 04 Dec 2014 06:48:44 GMT
You'll have to decide which is more expensive in your heterogenous
environment and optimize for the utilization of that. For example, you may
decide that memory is the only costing factor and you can discount the
number of cores. Then you could have 8GB on each worker each with four
cores. Note that cores in Spark don't necessarily map to cores on the
machine. It's just a configuration setting for how many simultaneous tasks
that worker can work on.

You are right that each executor gets the same amount of resources and I
would add level of parallelization. Your heterogeneity is in the physical
layout of your cluster, not in how Spark treats the workers as resources.
It's very important for Spark's workers to have the same resources
available because it needs to be able to generically divide and conquer
your data amongst all those workers.

Hope that helps,
Victor

On Wed, Dec 3, 2014 at 10:04 PM, rapelly kartheek <kartheek.mbms@gmail.com>
wrote:

> Thank you so much for valuable reply, Victor. That's a very clear solution
> I understood.
>
> Right now I have nodes with:
> 16Gb RAM, 4 cores; 8GB RAM, 4cores; 8GB RAM, 2 cores. From my
> understanding, the division could be something like, each executor can have
> 2 cores and 6GB RAM.   So, the ones with 16GB RAM and 4 cores can have two
> executors. Please let me know if my understanding is correct.
>
> But, I am not able to see  any heterogeneity in this setting as each
> executor has got the same amount of resources. Can you please clarify this
> doubt?
>
> Regards
> Karthik
>
> On Wed, Dec 3, 2014 at 11:11 PM, Victor Tso-Guillen <vtso@paxata.com>
> wrote:
>
>> I don't have a great answer for you. For us, we found a common divisor,
>> not necessarily a whole gigabyte, of the available memory of the different
>> hardware and used that as the amount of memory per worker and scaled the
>> number of cores accordingly so that every core in the system has the same
>> amount of memory. The quotient of the available memory and the common
>> divisor, hopefully a whole number to reduce waste, was the number of
>> workers we spun up. Therefore, if you have 64G, 30G, and 15G available
>> memory on your machines, the divisor could be 15G and you'd have 4, 2 and 1
>> worker per machine. Every worker on all the machines would have the same
>> number of cores, set to what you think is a good value.
>>
>> Hope that helps.
>>
>> On Wed, Dec 3, 2014 at 7:44 AM, <kartheek.mbms@gmail.com> wrote:
>>
>>> Hi Victor,
>>>
>>> I want to setup a heterogeneous stand-alone spark cluster. I have
>>> hardware with different memory sizes and varied number of cores per node. I
>>> could get all the nodes active in the cluster only when the size of memory
>>> per executor is set as the least available memory size of all nodes and is
>>> same with no.of cores/executor. As of now, I configure one executor per
>>> node.
>>>
>>> Can you please suggest some path to set up a stand-alone heterogeneous
>>> cluster such that I can efficiently use the available hardware.
>>>
>>> Thank you
>>>
>>>
>>>
>>>
>>> _____________________________________
>>> Sent from http://apache-spark-user-list.1001560.n3.nabble.com
>>>
>>>
>>
>

Mime
View raw message