spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Cheah <>
Subject Re: Kubernetes: why use init containers?
Date Wed, 10 Jan 2018 21:33:37 GMT
If we use spark-submit in client mode from the driver container, how do we handle needing to
switch between a cluster-mode scheduler backend and a client-mode scheduler backend in the

Something else re: client mode accessibility – if we make client mode accessible to users
even if it’s behind a flag, that’s a very different contract from needing to recompile
spark-submit to support client mode. The amount of effort required from the user to get to
client mode is very different between the two cases, and the contract is much clearer when
client mode is forbidden in all circumstances, versus client mode being allowed with a specific
flag. If we’re saying that we don’t support client mode, we should bias towards making
client mode as difficult as possible to access, i.e. impossible with a standard Spark distribution.

-Matt Cheah

On 1/10/18, 1:24 PM, "Marcelo Vanzin" <> wrote:

    On Wed, Jan 10, 2018 at 1:10 PM, Matt Cheah <> wrote:
    > I’d imagine this is a reason why YARN hasn’t went with using spark-submit from
the application master...
    I wouldn't use YARN as a template to follow when writing a new
    backend. A lot of the reason why the YARN backend works the way it
    does is because of backwards compatibility. IMO it would be much
    better to change the YARN backend to use spark-submit, because it
    would immensely simplify the code there. It was a nightmare to get
    YARN to reach feature parity with other backends because it has to
    pretty much reimplement everything.
    But doing that would break pretty much every Spark-on-YARN deployment,
    so it's not something we can do right now.
    For the other backends the situation is sort of similar; it probably
    wouldn't be hard to change standalone's DriverWrapper to also use
    spark-submit. But that brings potential side effects for existing
    users that don't exist with spark-on-k8s, because spark-on-k8s is new
    (the current fork aside).
    >  But using init-containers makes it such that we don’t need to use spark-submit
at all
    Those are actually separate concerns. There are a whole bunch of
    things that spark-submit provides you that you'd have to replicate in
    the k8s backend if not using it. Thinks like properly handling special
    characters in arguments, native library paths, "userClassPathFirst",
    etc. You get them almost for free with spark-submit, and using an init
    container does not solve any of those for you.
    I'd say that using spark-submit is really not up for discussion here;
    it saves you from re-implementing a whole bunch of code that you
    shouldn't even be trying to re-implement.
    Separately, if there is a legitimate need for an init container, then
    it can be added. But I don't see that legitimate need right now, so I
    don't see what it's bringing other than complexity.
    (And no, "the k8s documentation mentions that init containers are
    sometimes used to download dependencies" is not a legitimate need.)
    To unsubscribe e-mail:

View raw message