spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dale Richardson <dale...@hotmail.com>
Subject RE: Spark config option 'expression language' feedback request
Date Fri, 13 Mar 2015 22:52:45 GMT



Hi Reynold,They are some very good questions.
Re: Known libraries
There are a number of well known libraries that we could use to implement this features, including
MVEL, OGNL and JBOSS EL, or even Spring's EL.I looked at using them to prototype this feature
in the beginning, but they all ended up bringing in a lot of code to service a pretty small
functional requirement.The prime requirement I was trying to meet was:
1. Be able to specify quantities in kb,mb,gb etc transparently.2. Be able to specify some
options as fractions of system attributes eg cpuCores * 0.8
By just implementing this functionality and nothing else I figured I was constraining things
enough that end-users got useful functionality but not enough functionality to shoot themselves
in the foot in new and interesting ways. I couldn't see a nice way of limiting the expressiveness
of 3rd party libraries to this extent.
I'd be happy to re-look at the feasibility of pulling in one of the 3rd party libraries if
you think this approach has more merit, but I do caution that we may be opening a Pandora's
box of potential functionality.  Those 3rd party libraries have a lot of (potentially excess)
functionality in them.
Re: Code ComplexityI wrote the bare minimum code I could come up with to service the above
mentioned functionality, and then refactored it to use a stacked traits pattern which increased
the code size by about a further 30%.  The expression code as it stands is pretty minimal,
and has more then 120 unit tests proving its functionality. More then half the code that is
there is taken up by utility classes to allow easy reference to byte quantities and time units.
The design was deliberately limited to meeting the above requirements and not much more to
reduce the chance for other subtleties to raise their heads. 
Re: Work arounds.It would be pretty simple to implement fall back functionality to disable
expression parsing by:1. Globally having a configuration option to disable all expression
parsing and fall back to simple java property parsing.2. Locally having a known prefix that
disables expression parsing for that option.This should give enough workarounds to keep things
running in the unlikely event that something crops up no matter what happens.
Re: Error messagesIn regards to your comment about nice error messages I would have to agree
with you, it would have been nice.  In the end I just return an option[Double] to the calling
code for the parsed expression if the entire string is parsed correctly. Given the additional
complexity adding error messages involved I retrospectively justify this by saying how much
info do you need debug an expression like 'cpuCores * 0.8'? :)
Thanks for the feedback.
Regards,Dale.
> From: rxin@databricks.com
> Date: Fri, 13 Mar 2015 11:26:44 -0700
> Subject: Re: Spark config option 'expression language' feedback request
> To: dale__r@hotmail.com
> CC: dev@spark.apache.org
> 
> This is an interesting idea.
> 
> Are there well known libraries for doing this? Config is the one place
> where it would be great to have something ridiculously simple, so it is
> more or less bug free. I'm concerned about the complexity in this patch and
> subtle bugs that it might introduce to config options that users will have
> no workarounds. Also I believe it is fairly hard for nice error messages to
> propagate when using Scala's parser combinator.
> 
> 
> On Fri, Mar 13, 2015 at 3:07 AM, Dale Richardson <dale__r@hotmail.com>
> wrote:
> 
> >
> > PR#4937 ( https://github.com/apache/spark/pull/4937) is a feature to
> > allow for Spark configuration options (whether on command line, environment
> > variable or a configuration file) to be specified via a simple expression
> > language.
> >
> >
> > Such a feature has the following end-user benefits:
> > - Allows for the flexibility in specifying time intervals or byte
> > quantities in appropriate and easy to follow units e.g. 1 week rather
> > rather then 604800 seconds
> >
> > - Allows for the scaling of a configuration option in relation to a system
> > attributes. e.g.
> >
> > SPARK_WORKER_CORES = numCores - 1
> >
> > SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB
> >
> > - Gives the ability to scale multiple configuration options together eg:
> >
> > spark.driver.memory = 0.75 * physicalMemoryBytes
> >
> > spark.driver.maxResultSize = spark.driver.memory * 0.8
> >
> >
> > The following functions are currently supported by this PR:
> > NumCores:             Number of cores assigned to the JVM (usually ==
> > Physical machine cores)
> > PhysicalMemoryBytes:  Memory size of hosting machine
> >
> > JVMTotalMemoryBytes:  Current bytes of memory allocated to the JVM
> >
> > JVMMaxMemoryBytes:    Maximum number of bytes of memory available to the
> > JVM
> >
> > JVMFreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes
> >
> >
> > I was wondering if anybody on the mailing list has any further ideas on
> > other functions that could be useful to have when specifying spark
> > configuration options?
> > Regards,Dale.
> >

 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message