spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prashant Sharma (Jira)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-30985) Propagate SPARK_CONF_DIR files to driver and exec pods.
Date Wed, 01 Jul 2020 11:19:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-30985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Prashant Sharma updated SPARK-30985:
------------------------------------
    Description: 
SPARK_CONF_DIR hosts configuration files like, 
 1) spark-defaults.conf - containing all the spark properties.
 2) log4j.properties - Logger configuration.
 3) spark-env.sh - Environment variables to be setup at driver and executor.
 4) core-site.xml - Hadoop related configuration.
 5) fairscheduler.xml - Spark's fair scheduling policy at the job level.
 6) metrics.properties - Spark metrics.
 7) Any user specific - library or framework specific configuration file.

Traditionally, SPARK_CONF_DIR has been the home to all user specific configuration files.

So this feature, will let the user specific configuration files be mounted on the driver and
executor pods' SPARK_CONF_DIR.



Please review the attached design doc, for more details.

  was:
SPARK_CONF_DIR hosts configuration files like, 
1) spark-defaults.conf - containing all the spark properties.
2) log4j.properties - Logger configuration.
3) spark-env.sh - Environment variables to be setup at driver and executor.
4) core-site.xml - Hadoop related configuration.
5) fairscheduler.xml - Spark's fair scheduling policy at the job level.
6) metrics.properties - Spark metrics.
7) Any user specific - library or framework specific configuration file.

Traditionally, SPARK_CONF_DIR has been the home to all user specific configuration files and
the default behaviour in the Yarn or standalone mode is that these configuration files are
copied to the worker nodes as required by the users themselves. In other words, they are not
auto-copied.

But, in the case of  spark on kubernetes, we use spark images and generally these images are
approved or undergoe some kind of standardisation. These files cannot be simply copied to
the SPARK_CONF_DIR of the running executor and driver pods by the user. 

So, at the moment we have special casing for providing each configuration and for any other
user specific configuration files, the process is more complex, i.e. - e.g. one can start
with their own custom image of spark with configuration files pre installed etc..
Examples of special casing are:
1. Hadoop configuration in spark.kubernetes.hadoop.configMapName
2. Spark-env.sh as in spark.kubernetes.driverEnv.[EnvironmentVariableName]
3. Log4j.properties as in https://github.com/apache/spark/pull/26193
... And for those such special casing does not exist, they are simply out of luck.

So this feature, will let the user specific configuration files be mounted on the driver and
executor pods' SPARK_CONF_DIR.
At the moment it is not clear, if there is a need to, let user specify which config files
to propagate - to driver and or executor. But, if there is a case that feature will be helpful,
we can increase the scope of this work or create another JIRA issue to track that work.


> Propagate SPARK_CONF_DIR files to driver and exec pods.
> -------------------------------------------------------
>
>                 Key: SPARK-30985
>                 URL: https://issues.apache.org/jira/browse/SPARK-30985
>             Project: Spark
>          Issue Type: Improvement
>          Components: Kubernetes, Spark Core
>    Affects Versions: 3.1.0
>            Reporter: Prashant Sharma
>            Priority: Major
>
> SPARK_CONF_DIR hosts configuration files like, 
>  1) spark-defaults.conf - containing all the spark properties.
>  2) log4j.properties - Logger configuration.
>  3) spark-env.sh - Environment variables to be setup at driver and executor.
>  4) core-site.xml - Hadoop related configuration.
>  5) fairscheduler.xml - Spark's fair scheduling policy at the job level.
>  6) metrics.properties - Spark metrics.
>  7) Any user specific - library or framework specific configuration file.
> Traditionally, SPARK_CONF_DIR has been the home to all user specific configuration files.
> So this feature, will let the user specific configuration files be mounted on the driver
and executor pods' SPARK_CONF_DIR.
> Please review the attached design doc, for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message