spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stavros Kontopoulos (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates
Date Sat, 02 Jun 2018 18:21:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499140#comment-16499140
] 

Stavros Kontopoulos edited comment on SPARK-24434 at 6/2/18 6:20 PM:
---------------------------------------------------------------------

I agree with [~felixcheung] and [~liyinan926]. From the design doc for affinity stuff it
was quickly obvious that things will only get more complex if we try to map yaml to Spark
conf and you also loose some power with that mapping.

In the past I have used json in a production system for passing config options to Spark Jobs.
It was proven to be fine for one good reason, which was json schema validation that would
validate also business properties early enough, for example is a number in valid range? This
was pretty cool to fail fast.

The tedious thing was developing with json libs (I ended up using jakson btw).

Here though I would start with the most simple solution at least from an architectural perspective.
That is point to the yaml spec. 

Yaml is the way config is specified to K8s pods so I would start with that. Regarding precedence,
semantically yaml options are just spark options and precedence is defined by the order Spark
config sees in general spark options specified either in the properties file or as Java properties
etc. The format should't violate the precedence. Yaml and java properties are not exactly
equivalent though, as yaml is more expressive when it comes to complex structures, at least
it makes your life easier. For example on mesos in order to specify multiple secrets you need
to specify them with [commas|https://spark.apache.org/docs/latest/running-on-mesos.html] and
order matters. Also commas cant be part of the name. 

Of course yaml is not Spark-like but K8s is a sophisticated deployment env anyway.

So one question that comes here is whether all these infrastructure configuration properties
belong to Spark anyway (another way to view the whole problem). There was a long discussion
for resource managers to be out of the upstream project, but was blocked due to the required
changes for a common API. Thus from that angle the problem is a bit easier to solve, properties
dont need to be semantically the same with Spark config options and could be resource manager
specific. But since this separation never happened I guess we stick with yaml as another way
for passing options.

 


was (Author: skonto):
I agree with [~felixcheung] and [~liyinan926]. From the design doc for affinity stuff it
was quickly obvious that things will only get more complex if we try to map yaml to Spark
conf and you also loose some power with that mapping.

In the past I have used json in a production system for passing config options to Spark Jobs.
It was proven to be fine for one good reason, which was json schema validation that would
validate also business properties early enough, for example is a number in valid range? This
was pretty cool to fail fast.

The tedious thing was developing with json libs (I ended up using jakson btw).

Here though I would start with the most simple solution at least from an architectural perspective.
That is point to the yaml spec. 

Yaml is the way config is specified to K8s pods so I would start with that. Regarding precedence,
semantically yaml options are just spark options and precedence is defined by the order Spark
config sees in general spark options specified either in the properties file or as Java properties
etc. The format should't violate the precedence. Yaml and java properties are not exactly
equivalent though, as yaml is more expressive when it comes to complex structures, at least
it makes your life easier. For example on mesos in order to specify multiple secrets you need
to specify them with [commas|https://spark.apache.org/docs/latest/running-on-mesos.html] and
order matters. Also commas cant be part of the name. 

Of course yaml is not Spark-like but K8s is a sophisticated deployment env anyway.

So one question that comes here is all this infrastructure configuration properties may not
belong to Spark anyway (another way to view the whole problem). There was a long discussion
for resource managers to be out of the upstream project, but was blocked due to the required
changes for a common API. Thus from that angle the problem is a bit easier to solve, properties
dont need to be semantically the same with Spark config options and could be resource manager
specific. But since this separation never happened I guess we stick with yaml as another way
for passing options.

 

> Support user-specified driver and executor pod templates
> --------------------------------------------------------
>
>                 Key: SPARK-24434
>                 URL: https://issues.apache.org/jira/browse/SPARK-24434
>             Project: Spark
>          Issue Type: New Feature
>          Components: Kubernetes
>    Affects Versions: 2.4.0
>            Reporter: Yinan Li
>            Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the current approach
of adding new Spark configuration options has some serious drawbacks: 1) it means more Kubernetes
specific configuration options to maintain, and 2) it widens the gap between the declarative
model used by Kubernetes and the configuration model used by Spark. We should start designing
a solution that allows users to specify pod templates as central places for all customization
needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message