spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <matei.zaha...@gmail.com>
Subject Re: Propose to Re-organize the scripts and configurations
Date Wed, 09 Oct 2013 17:36:05 GMT
Hey Shane, I don't know if you saw my message on GitHub, but I did review this a few days ago:
https://github.com/apache/incubator-spark/pull/21. Make sure you're allowing emails from GitHub
to get comments. It looks good overall but I had some suggestions in there.

Matei

On Sep 26, 2013, at 7:24 PM, Shane Huang <shannie.huang@gmail.com> wrote:

> I have created a pull request to address the basic needs of our customer
> for separating the admin and user scripts. Link here
> https://github.com/apache/incubator-spark/pull/21. Please kindly review.
> And we can also discuss if there's more functionality needed.
> 
> 
> On Sun, Sep 22, 2013 at 12:07 PM, Shane Huang <shannie.huang@gmail.com>wrote:
> 
>> And I created a new issue SPARK-915 to track the re-org of scripts as
>> SPARK-544 only talks about Config.
>> https://spark-project.atlassian.net/browse/SPARK-915
>> 
>> 
>> On Wed, Sep 18, 2013 at 1:42 AM, Matei Zaharia <matei.zaharia@gmail.com>wrote:
>> 
>>> Hi Shane,
>>> 
>>> I agree with all these points. Improving the configuration system is one
>>> of the main things I'd like to have in the next release.
>>> 
>>>> 1) Usually the application developers/users and platform administrators
>>>> belongs to two teams. So it's better to separate the scripts used by
>>>> administrators and application users, e.g. put them in sbin and bin
>>> folders
>>>> respectively
>>> 
>>> Yup, right now we don't have any attempt to install on standard system
>>> paths.
>>> 
>>>> 3) If there are multiple ways to specify an option, an overriding rule
>>>> should be present and should not be error-prone.
>>> 
>>> Yes, I think this should always be Configuration class in code > system
>>> properties > env vars. Over time we will deprecate the env vars and maybe
>>> even system properties.
>>> 
>>>> 4) Currently the options are set and get using System property. It's
>>> hard
>>>> to manage and inconvenient for users. It's good to gather the options
>>> into
>>>> one file using format like xml or json.
>>> 
>>> I think this is the main thing to do first -- pick one configuration
>>> class and change the code to use this.
>>> 
>>>> Our rough proposal:
>>>> 
>>>>  - Scripts
>>>> 
>>>>  1. make an "sbin" folder containing all the scripts for
>>> administrators,
>>>>  specifically,
>>>>     - all service administration scripts, i.e. start-*, stop-*,
>>>>     slaves.sh, *-daemons, *-daemon scripts
>>>>     - low-level or internally used utility scripts, i.e.
>>>>     compute-classpath, spark-config, spark-class, spark-executor
>>>>  2. make a "bin" folder containing all the scripts for application
>>>>  developers/users, specifically,
>>>>     - user level app  running scripts, i.e. pyspark, spark-shell, and
>>> we
>>>>     propose to add a script "spark" for users to run applications
>>> (very much
>>>>     like spark-class but may add some more control or convenient
>>> utilities)
>>>>     - scripts for status checking, e.g. spark and hadoop version
>>>>     checking, running applications checking, etc. We can make this a
>>> separate
>>>>     script or add functionality to "spark" script.
>>>>  3. No wandering scripts outside the sbin and bin folders
>>> 
>>> Makes sense.
>>> 
>>>>  -  Configurations/Options and overriding rule
>>>> 
>>>>  1. Define a Configuration class which contains all the options
>>> available
>>>>  for Spark application. A Configuration instance can be de-/serialized
>>>>  from/to a json formatted file.
>>>>  2. Each application (SparkContext) has one Configuration instance and
>>> it
>>>>  is initialized by the application which creates it (either read from
>>> file
>>>>  or passed from command line options or env SPARK_JAVA_OPTS).
>>>>  3. When launching an Executor on a node, the Configuration is firstly
>>>>  initialized using the node-local configuration file as default. The
>>>>  Configuration passed from application driver context will override any
>>>>  options specified in default.
>>> 
>>> This sounds great to me! The one thing I'll add is that we might want to
>>> prevent applications from overriding certain settings on each node, such as
>>> work directories. The best way is to probably just ignore the app's version
>>> of those settings in the Executor.
>>> 
>>> If you guys would like, feel free to write up this design on SPARK-544
>>> and start working on it. I think it looks good.
>>> 
>>> Matei
>> 
>> 
>> 
>> 
>> --
>> *Shane Huang *
>> *Intel Asia-Pacific R&D Ltd.*
>> *Email: shengsheng.huang@intel.com*
>> 
>> 
> 
> 
> -- 
> *Shane Huang *
> *Intel Asia-Pacific R&D Ltd.*
> *Email: shengsheng.huang@intel.com*


Mime
View raw message