spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dale Richardson <dale...@hotmail.com>
Subject RE: Spark config option 'expression language' feedback request
Date Sat, 14 Mar 2015 02:57:33 GMT
Mridul,I may have added some confusion by giving examples in completely different areas. For
example the number of cores available for tasking on each worker machine is a resource-controller
level configuration variable. In standalone mode (ie using Spark's home-grown resource manager)
the configuration variable SPARK_WORKER_CORES is an item that spark admins can set (and we
can use expressions for). The equivalent variable for YARN (Yarn.nodemanager.resource.cpu-vcores)
is only used by Yarn's node manager setup and is set by Yarn administrators and outside of
control of spark (and most users).  If you are not a cluster administrator then both variables
are irrelevant to you. The same goes for SPARK_WORKER_MEMORY.

As for spark.executor.memory,  As there is no way to know the attributes of a machine before
a task is allocated to it, we cannot use any of the JVMInfo functions. For options like that
the expression parser can easily be limited to supporting different byte units of scale (kb/mb/gb
etc) and other configuration variables only.  
Regards,Dale.




> Date: Fri, 13 Mar 2015 17:30:51 -0700
> Subject: Re: Spark config option 'expression language' feedback request
> From: mridul@gmail.com
> To: dale__r@hotmail.com
> CC: dev@spark.apache.org
> 
> Let me try to rephrase my query.
> How can a user specify, for example, what the executor memory should
> be or number of cores should be.
> 
> I dont want a situation where some variables can be specified using
> one set of idioms (from this PR for example) and another set cannot
> be.
> 
> 
> Regards,
> Mridul
> 
> 
> 
> 
> On Fri, Mar 13, 2015 at 4:06 PM, Dale Richardson <dale__r@hotmail.com> wrote:
> >
> >
> >
> > Thanks for your questions Mridul.
> > I assume you are referring to how the functionality to query system state works
in Yarn and Mesos?
> > The API's used are the standard JVM API's so the functionality will work without
change. There is no real use case for using 'physicalMemoryBytes' in these cases though, as
the JVM size has already been limited by the resource manager.
> > Regards,Dale.
> >> Date: Fri, 13 Mar 2015 08:20:33 -0700
> >> Subject: Re: Spark config option 'expression language' feedback request
> >> From: mridul@gmail.com
> >> To: dale__r@hotmail.com
> >> CC: dev@spark.apache.org
> >>
> >> I am curious how you are going to support these over mesos and yarn.
> >> Any configure change like this should be applicable to all of them, not
> >> just local and standalone modes.
> >>
> >> Regards
> >> Mridul
> >>
> >> On Friday, March 13, 2015, Dale Richardson <dale__r@hotmail.com> wrote:
> >>
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > PR#4937 ( https://github.com/apache/spark/pull/4937) is a feature to
> >> > allow for Spark configuration options (whether on command line, environment
> >> > variable or a configuration file) to be specified via a simple expression
> >> > language.
> >> >
> >> >
> >> > Such a feature has the following end-user benefits:
> >> > - Allows for the flexibility in specifying time intervals or byte
> >> > quantities in appropriate and easy to follow units e.g. 1 week rather
> >> > rather then 604800 seconds
> >> >
> >> > - Allows for the scaling of a configuration option in relation to a system
> >> > attributes. e.g.
> >> >
> >> > SPARK_WORKER_CORES = numCores - 1
> >> >
> >> > SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB
> >> >
> >> > - Gives the ability to scale multiple configuration options together eg:
> >> >
> >> > spark.driver.memory = 0.75 * physicalMemoryBytes
> >> >
> >> > spark.driver.maxResultSize = spark.driver.memory * 0.8
> >> >
> >> >
> >> > The following functions are currently supported by this PR:
> >> > NumCores:             Number of cores assigned to the JVM (usually ==
> >> > Physical machine cores)
> >> > PhysicalMemoryBytes:  Memory size of hosting machine
> >> >
> >> > JVMTotalMemoryBytes:  Current bytes of memory allocated to the JVM
> >> >
> >> > JVMMaxMemoryBytes:    Maximum number of bytes of memory available to the
> >> > JVM
> >> >
> >> > JVMFreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes
> >> >
> >> >
> >> > I was wondering if anybody on the mailing list has any further ideas on
> >> > other functions that could be useful to have when specifying spark
> >> > configuration options?
> >> > Regards,Dale.
> >> >
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
> 
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message