spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Omernik <j...@omernik.com>
Subject Re: Configuring Spark Memory
Date Thu, 24 Jul 2014 20:27:06 GMT
SO this is good information for standalone, but how is memory distributed
within Mesos?  There's coarse grain mode where the execute stays active, or
theres fine grained mode where it appears each task is it's only process in
mesos, how to memory allocations work in these cases? Thanks!



On Thu, Jul 24, 2014 at 12:14 PM, Martin Goodson <martin@skimlinks.com>
wrote:

> Great - thanks for the clarification Aaron. The offer stands for me to
> write some documentation and an example that covers this without leaving
> *any* room for ambiguity.
>
>
>
>
> --
> Martin Goodson  |  VP Data Science
> (0)20 3397 1240
> [image: Inline image 1]
>
>
> On Thu, Jul 24, 2014 at 6:09 PM, Aaron Davidson <ilikerps@gmail.com>
> wrote:
>
>> Whoops, I was mistaken in my original post last year. By default, there
>> is one executor per node per Spark Context, as you said.
>> "spark.executor.memory" is the amount of memory that the application
>> requests for each of its executors. SPARK_WORKER_MEMORY is the amount of
>> memory a Spark Worker is willing to allocate in executors.
>>
>> So if you were to set SPARK_WORKER_MEMORY to 8g everywhere on your
>> cluster, and spark.executor.memory to 4g, you would be able to run 2
>> simultaneous Spark Contexts who get 4g per node. Similarly, if
>> spark.executor.memory were 8g, you could only run 1 Spark Context at a time
>> on the cluster, but it would get all the cluster's memory.
>>
>>
>> On Thu, Jul 24, 2014 at 7:25 AM, Martin Goodson <martin@skimlinks.com>
>> wrote:
>>
>>> Thank you Nishkam,
>>> I have read your code. So, for the sake of my understanding, it seems
>>> that for each spark context there is one executor per node? Can anyone
>>> confirm this?
>>>
>>>
>>> --
>>> Martin Goodson  |  VP Data Science
>>> (0)20 3397 1240
>>> [image: Inline image 1]
>>>
>>>
>>> On Thu, Jul 24, 2014 at 6:12 AM, Nishkam Ravi <nravi@cloudera.com>
>>> wrote:
>>>
>>>> See if this helps:
>>>>
>>>> https://github.com/nishkamravi2/SparkAutoConfig/
>>>>
>>>> It's a very simple tool for auto-configuring default parameters in
>>>> Spark. Takes as input high-level parameters (like number of nodes, cores
>>>> per node, memory per node, etc) and spits out default configuration, user
>>>> advice and command line. Compile (javac SparkConfigure.java) and run (java
>>>> SparkConfigure).
>>>>
>>>> Also cc'ing dev in case others are interested in helping evolve this
>>>> over time (by refining the heuristics and adding more parameters).
>>>>
>>>>
>>>>  On Wed, Jul 23, 2014 at 8:31 AM, Martin Goodson <martin@skimlinks.com>
>>>> wrote:
>>>>
>>>>> Thanks Andrew,
>>>>>
>>>>> So if there is only one SparkContext there is only one executor per
>>>>> machine? This seems to contradict Aaron's message from the link above:
>>>>>
>>>>> "If each machine has 16 GB of RAM and 4 cores, for example, you might
>>>>> set spark.executor.memory between 2 and 3 GB, totaling 8-12 GB used by
>>>>> Spark.)"
>>>>>
>>>>> Am I reading this incorrectly?
>>>>>
>>>>> Anyway our configuration is 21 machines (one master and 20 slaves)
>>>>> each with 60Gb. We would like to use 4 cores per machine. This is pyspark
>>>>> so we want to leave say 16Gb on each machine for python processes.
>>>>>
>>>>> Thanks again for the advice!
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Martin Goodson  |  VP Data Science
>>>>> (0)20 3397 1240
>>>>> [image: Inline image 1]
>>>>>
>>>>>
>>>>> On Wed, Jul 23, 2014 at 4:19 PM, Andrew Ash <andrew@andrewash.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Martin,
>>>>>>
>>>>>> In standalone mode, each SparkContext you initialize gets its own
set
>>>>>> of executors across the cluster.  So for example if you have two
shells
>>>>>> open, they'll each get two JVMs on each worker machine in the cluster.
>>>>>>
>>>>>> As far as the other docs, you can configure the total number of cores
>>>>>> requested for the SparkContext, the amount of memory for the executor
JVM
>>>>>> on each machine, the amount of memory for the Master/Worker daemons
(little
>>>>>> needed since work is done in executors), and several other settings.
>>>>>>
>>>>>> Which of those are you interested in?  What spec hardware do you
have
>>>>>> and how do you want to configure it?
>>>>>>
>>>>>> Andrew
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 23, 2014 at 6:10 AM, Martin Goodson <martin@skimlinks.com
>>>>>> > wrote:
>>>>>>
>>>>>>> We are having difficulties configuring Spark, partly because
we
>>>>>>> still don't understand some key concepts. For instance, how many
executors
>>>>>>> are there per machine in standalone mode? This is after having
>>>>>>> closely read the documentation several times:
>>>>>>>
>>>>>>> *http://spark.apache.org/docs/latest/configuration.html
>>>>>>> <http://spark.apache.org/docs/latest/configuration.html>*
>>>>>>> *http://spark.apache.org/docs/latest/spark-standalone.html
>>>>>>> <http://spark.apache.org/docs/latest/spark-standalone.html>*
>>>>>>> *http://spark.apache.org/docs/latest/tuning.html
>>>>>>> <http://spark.apache.org/docs/latest/tuning.html>*
>>>>>>> *http://spark.apache.org/docs/latest/cluster-overview.html
>>>>>>> <http://spark.apache.org/docs/latest/cluster-overview.html>*
>>>>>>>
>>>>>>> The cluster overview has some information here about executors
but
>>>>>>> is ambiguous about whether there are single executors or multiple
executors
>>>>>>> on each machine.
>>>>>>>
>>>>>>>  This message from Aaron Davidson implies that the executor memory
>>>>>>> should be set to total available memory on the machine divided
by the
>>>>>>> number of cores:
>>>>>>> *http://mail-archives.apache.org/mod_mbox/spark-user/201312.mbox/%3CCANGvG8o5K1SxgnFMT_9DK=vJ_pLBVe6zH_DN5sjwPznPbcpATA@mail.gmail.com%3E
>>>>>>> <http://mail-archives.apache.org/mod_mbox/spark-user/201312.mbox/%3CCANGvG8o5K1SxgnFMT_9DK=vJ_pLBVe6zH_DN5sjwPznPbcpATA@mail.gmail.com%3E>*
>>>>>>>
>>>>>>> But other messages imply that the executor memory should be set
to
>>>>>>> the *total* available memory of each machine.
>>>>>>>
>>>>>>> We would very much appreciate some clarity on this and the myriad
of
>>>>>>> other memory settings available (daemon memory, worker memory
etc). Perhaps
>>>>>>> a worked example could be added to the docs? I would be happy
to provide
>>>>>>> some text as soon as someone can enlighten me on the technicalities!
>>>>>>>
>>>>>>> Thank you
>>>>>>>
>>>>>>> --
>>>>>>> Martin Goodson  |  VP Data Science
>>>>>>> (0)20 3397 1240
>>>>>>> [image: Inline image 1]
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message