spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xia, Junluan" <junluan....@intel.com>
Subject RE: Propose to Re-organize the scripts and configurations
Date Fri, 11 Oct 2013 04:47:58 GMT
Hi Metei

Shane is on vacation now. I will take charge of this pull request.

-----Original Message-----
From: Matei Zaharia [mailto:matei.zaharia@gmail.com] 
Sent: Thursday, October 10, 2013 1:36 AM
To: dev@spark.incubator.apache.org
Cc: Shane Huang
Subject: Re: Propose to Re-organize the scripts and configurations

Hey Shane, I don't know if you saw my message on GitHub, but I did review this a few days
ago: https://github.com/apache/incubator-spark/pull/21. Make sure you're allowing emails from
GitHub to get comments. It looks good overall but I had some suggestions in there.

Matei

On Sep 26, 2013, at 7:24 PM, Shane Huang <shannie.huang@gmail.com> wrote:

> I have created a pull request to address the basic needs of our 
> customer for separating the admin and user scripts. Link here 
> https://github.com/apache/incubator-spark/pull/21. Please kindly review.
> And we can also discuss if there's more functionality needed.
> 
> 
> On Sun, Sep 22, 2013 at 12:07 PM, Shane Huang <shannie.huang@gmail.com>wrote:
> 
>> And I created a new issue SPARK-915 to track the re-org of scripts as
>> SPARK-544 only talks about Config.
>> https://spark-project.atlassian.net/browse/SPARK-915
>> 
>> 
>> On Wed, Sep 18, 2013 at 1:42 AM, Matei Zaharia <matei.zaharia@gmail.com>wrote:
>> 
>>> Hi Shane,
>>> 
>>> I agree with all these points. Improving the configuration system is 
>>> one of the main things I'd like to have in the next release.
>>> 
>>>> 1) Usually the application developers/users and platform 
>>>> administrators belongs to two teams. So it's better to separate the 
>>>> scripts used by administrators and application users, e.g. put them 
>>>> in sbin and bin
>>> folders
>>>> respectively
>>> 
>>> Yup, right now we don't have any attempt to install on standard 
>>> system paths.
>>> 
>>>> 3) If there are multiple ways to specify an option, an overriding 
>>>> rule should be present and should not be error-prone.
>>> 
>>> Yes, I think this should always be Configuration class in code > 
>>> system properties > env vars. Over time we will deprecate the env 
>>> vars and maybe even system properties.
>>> 
>>>> 4) Currently the options are set and get using System property. 
>>>> It's
>>> hard
>>>> to manage and inconvenient for users. It's good to gather the 
>>>> options
>>> into
>>>> one file using format like xml or json.
>>> 
>>> I think this is the main thing to do first -- pick one configuration 
>>> class and change the code to use this.
>>> 
>>>> Our rough proposal:
>>>> 
>>>>  - Scripts
>>>> 
>>>>  1. make an "sbin" folder containing all the scripts for
>>> administrators,
>>>>  specifically,
>>>>     - all service administration scripts, i.e. start-*, stop-*,
>>>>     slaves.sh, *-daemons, *-daemon scripts
>>>>     - low-level or internally used utility scripts, i.e.
>>>>     compute-classpath, spark-config, spark-class, spark-executor  
>>>> 2. make a "bin" folder containing all the scripts for application  
>>>> developers/users, specifically,
>>>>     - user level app  running scripts, i.e. pyspark, spark-shell, 
>>>> and
>>> we
>>>>     propose to add a script "spark" for users to run applications
>>> (very much
>>>>     like spark-class but may add some more control or convenient
>>> utilities)
>>>>     - scripts for status checking, e.g. spark and hadoop version
>>>>     checking, running applications checking, etc. We can make this 
>>>> a
>>> separate
>>>>     script or add functionality to "spark" script.
>>>>  3. No wandering scripts outside the sbin and bin folders
>>> 
>>> Makes sense.
>>> 
>>>>  -  Configurations/Options and overriding rule
>>>> 
>>>>  1. Define a Configuration class which contains all the options
>>> available
>>>>  for Spark application. A Configuration instance can be 
>>>> de-/serialized  from/to a json formatted file.
>>>>  2. Each application (SparkContext) has one Configuration instance 
>>>> and
>>> it
>>>>  is initialized by the application which creates it (either read 
>>>> from
>>> file
>>>>  or passed from command line options or env SPARK_JAVA_OPTS).
>>>>  3. When launching an Executor on a node, the Configuration is 
>>>> firstly  initialized using the node-local configuration file as 
>>>> default. The  Configuration passed from application driver context 
>>>> will override any  options specified in default.
>>> 
>>> This sounds great to me! The one thing I'll add is that we might 
>>> want to prevent applications from overriding certain settings on 
>>> each node, such as work directories. The best way is to probably 
>>> just ignore the app's version of those settings in the Executor.
>>> 
>>> If you guys would like, feel free to write up this design on 
>>> SPARK-544 and start working on it. I think it looks good.
>>> 
>>> Matei
>> 
>> 
>> 
>> 
>> --
>> *Shane Huang *
>> *Intel Asia-Pacific R&D Ltd.*
>> *Email: shengsheng.huang@intel.com*
>> 
>> 
> 
> 
> --
> *Shane Huang *
> *Intel Asia-Pacific R&D Ltd.*
> *Email: shengsheng.huang@intel.com*


Mime
View raw message