spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Goodson <mar...@skimlinks.com>
Subject Re: Configuring Spark Memory
Date Thu, 24 Jul 2014 14:25:24 GMT
Thank you Nishkam,
I have read your code. So, for the sake of my understanding, it seems that
for each spark context there is one executor per node? Can anyone confirm
this?


-- 
Martin Goodson  |  VP Data Science
(0)20 3397 1240
[image: Inline image 1]


On Thu, Jul 24, 2014 at 6:12 AM, Nishkam Ravi <nravi@cloudera.com> wrote:

> See if this helps:
>
> https://github.com/nishkamravi2/SparkAutoConfig/
>
> It's a very simple tool for auto-configuring default parameters in Spark.
> Takes as input high-level parameters (like number of nodes, cores per node,
> memory per node, etc) and spits out default configuration, user advice and
> command line. Compile (javac SparkConfigure.java) and run (java
> SparkConfigure).
>
> Also cc'ing dev in case others are interested in helping evolve this over
> time (by refining the heuristics and adding more parameters).
>
>
>  On Wed, Jul 23, 2014 at 8:31 AM, Martin Goodson <martin@skimlinks.com>
> wrote:
>
>> Thanks Andrew,
>>
>> So if there is only one SparkContext there is only one executor per
>> machine? This seems to contradict Aaron's message from the link above:
>>
>> "If each machine has 16 GB of RAM and 4 cores, for example, you might set
>> spark.executor.memory between 2 and 3 GB, totaling 8-12 GB used by Spark.)"
>>
>> Am I reading this incorrectly?
>>
>> Anyway our configuration is 21 machines (one master and 20 slaves) each
>> with 60Gb. We would like to use 4 cores per machine. This is pyspark so we
>> want to leave say 16Gb on each machine for python processes.
>>
>> Thanks again for the advice!
>>
>>
>>
>> --
>> Martin Goodson  |  VP Data Science
>> (0)20 3397 1240
>> [image: Inline image 1]
>>
>>
>> On Wed, Jul 23, 2014 at 4:19 PM, Andrew Ash <andrew@andrewash.com> wrote:
>>
>>> Hi Martin,
>>>
>>> In standalone mode, each SparkContext you initialize gets its own set of
>>> executors across the cluster.  So for example if you have two shells open,
>>> they'll each get two JVMs on each worker machine in the cluster.
>>>
>>> As far as the other docs, you can configure the total number of cores
>>> requested for the SparkContext, the amount of memory for the executor JVM
>>> on each machine, the amount of memory for the Master/Worker daemons (little
>>> needed since work is done in executors), and several other settings.
>>>
>>> Which of those are you interested in?  What spec hardware do you have
>>> and how do you want to configure it?
>>>
>>> Andrew
>>>
>>>
>>> On Wed, Jul 23, 2014 at 6:10 AM, Martin Goodson <martin@skimlinks.com>
>>> wrote:
>>>
>>>> We are having difficulties configuring Spark, partly because we still
>>>> don't understand some key concepts. For instance, how many executors are
>>>> there per machine in standalone mode? This is after having closely
>>>> read the documentation several times:
>>>>
>>>> *http://spark.apache.org/docs/latest/configuration.html
>>>> <http://spark.apache.org/docs/latest/configuration.html>*
>>>> *http://spark.apache.org/docs/latest/spark-standalone.html
>>>> <http://spark.apache.org/docs/latest/spark-standalone.html>*
>>>> *http://spark.apache.org/docs/latest/tuning.html
>>>> <http://spark.apache.org/docs/latest/tuning.html>*
>>>> *http://spark.apache.org/docs/latest/cluster-overview.html
>>>> <http://spark.apache.org/docs/latest/cluster-overview.html>*
>>>>
>>>> The cluster overview has some information here about executors but is
>>>> ambiguous about whether there are single executors or multiple executors
on
>>>> each machine.
>>>>
>>>>  This message from Aaron Davidson implies that the executor memory
>>>> should be set to total available memory on the machine divided by the
>>>> number of cores:
>>>> *http://mail-archives.apache.org/mod_mbox/spark-user/201312.mbox/%3CCANGvG8o5K1SxgnFMT_9DK=vJ_pLBVe6zH_DN5sjwPznPbcpATA@mail.gmail.com%3E
>>>> <http://mail-archives.apache.org/mod_mbox/spark-user/201312.mbox/%3CCANGvG8o5K1SxgnFMT_9DK=vJ_pLBVe6zH_DN5sjwPznPbcpATA@mail.gmail.com%3E>*
>>>>
>>>> But other messages imply that the executor memory should be set to the
>>>> *total* available memory of each machine.
>>>>
>>>> We would very much appreciate some clarity on this and the myriad of
>>>> other memory settings available (daemon memory, worker memory etc). Perhaps
>>>> a worked example could be added to the docs? I would be happy to provide
>>>> some text as soon as someone can enlighten me on the technicalities!
>>>>
>>>> Thank you
>>>>
>>>> --
>>>> Martin Goodson  |  VP Data Science
>>>> (0)20 3397 1240
>>>> [image: Inline image 1]
>>>>
>>>
>>>
>>
>

Mime
View raw message