spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anirudh Ramanathan <>
Subject Re: Kubernetes: why use init containers?
Date Wed, 10 Jan 2018 03:39:38 GMT
Marcelo, to address the points you raised:

> k8s uses docker images. Users can create docker images with all the
dependencies their app needs, and submit the app using that image.

The entire reason why we support additional methods of localizing
dependencies than baking everything into docker images is that
it's not a very good workflow fit for all use-cases. There are definitely
some users that will do that (and I've spoken to some),
and they build a versioned image in their registry every time they change
their code with a CD pipeline,
but a lot of people are looking for something lighter - and versioning
application code, not entire images.
Telling users that they must rebuild images and pay the cost of localizing
new images from the docker registry
(which is also not very well understood/measured in terms of performance)
every time seems less than convincing to me.

> - The original spark-on-k8s spec mentioned a "dependency server"
approach which sounded like a more generic version of the YARN
distributed cache, which I hope can be a different way of mitigating
that issue. With that work, we could build this functionality into
spark-submit itself and have other backends also benefit.

The resource staging server as was written was a non-HA fileserver for
staging dependencies within the cluster.
It's not distributed, and has no notion of locality, etc. I don't think we
had plans (yet) to invest in to make it more
like the distributed cache you mentioned, at least not until we heard
back from the community - so, that's unplanned work at this point. It's
also hard to imagine how we could
extend that to go beyond just K8s tbh. We should definitely have a JIRA
tracking this, if that's a
direction we want to explore in the future.

I understand the change you're proposing would simplify the code but a
decision here seems hard to make
until we get some real benchmarks/measurements, or user feedback.

On Tue, Jan 9, 2018 at 7:24 PM, Matt Cheah <> wrote:

> A few reasons to prefer init-containers come to mind:
> Firstly, if we used spark-submit from within the driver container, the
> executors wouldn’t receive the jars on their class loader until after the
> executor starts because the executor has to launch first before localizing
> resources. It is certainly possible to make the class loader work with the
> user’s jars here, as is the case with all the client mode implementations,
> but, it seems cleaner to have the classpath include the user’s jars at
> executor launch time instead of needing to reason about the classloading
> order.
> We can also consider the idiomatic approach from the perspective of
> Kubernetes. Yinan touched on this already, but init-containers are
> traditionally meant to prepare the environment for the application that is
> to be run, which is exactly what we do here. This also makes it such that
> the localization process can be completely decoupled from the execution of
> the application itself. We can then for example detect the errors that
> happen on the resource localization layer, say when an HDFS cluster is
> down, before the application itself launches. The failure at the
> init-container stage is explicitly noted via the Kubernetes pod status API.
> Finally, running spark-submit from the container would make the
> SparkSubmit code inadvertently allow running client mode Kubernetes
> applications as well. We’re not quite ready to support that. Even if we
> were, it’s not entirely intuitive for the cluster mode code path to depend
> on the client mode code path. This isn’t entirely without precedent though,
> as Mesos has a similar dependency.
> Essentially the semantics seem neater and the contract is very explicit
> when using an init-container, even though the code does end up being more
> complex.
> *From: *Yinan Li <>
> *Date: *Tuesday, January 9, 2018 at 7:16 PM
> *To: *Nicholas Chammas <>
> *Cc: *Anirudh Ramanathan <>, Marcelo Vanzin
> <>, Matt Cheah <>, Kimoon Kim <
>>, dev <>
> *Subject: *Re: Kubernetes: why use init containers?
> The init-container is required for use with the resource staging server (
> master/src/jekyll/
> staging-server[]
> <>).
> The resource staging server (RSS) is a spark-on-k8s component running in a
> Kubernetes cluster for staging submission client local dependencies to
> Spark pods. The init-container is responsible for downloading the
> dependencies from the RSS. We haven't upstream the RSS code yet, but this
> is a value add component for Spark on K8s as a way for users to use
> submission local dependencies without resorting to other mechanisms that
> are not immediately available on most Kubernetes clusters, e.g., HDFS. We
> do plan to upstream it in the 2.4 timeframe. Additionally, the
> init-container is a Kubernetes native way of making sure that the
> dependencies are localized before the main driver/executor containers are
> started. IMO, this guarantee is positive to have and it helps achieve
> separation of concerns. So IMO, I think the init-container is a valuable
> component and should be kept.
> On Tue, Jan 9, 2018 at 6:25 PM, Nicholas Chammas <
>> wrote:
> I’d like to point out the output of “git show —stat” for that diff:
> 29 files changed, 130 insertions(+), 1560 deletions(-)
> +1 for that and generally for the idea of leveraging spark-submit.
> You can argue that executors downloading from
> external servers would be faster than downloading from the driver, but
> I’m not sure I’d agree - it can go both ways.
> On a tangentially related note, one of the main reasons
> spark-ec2[]
> <>
> is so slow to launch clusters is that it distributes files like the Spark
> binaries to all the workers via the master. Because of that, the launch
> time scaled with the number of workers requested[]
> <>
> .
> When I wrote Flintrock[]
> <>,
> I got a large improvement in launch time over spark-ec2 simply by having
> all the workers download the installation files in parallel from an
> external host (typically S3 or an Apache mirror). And launch time became
> largely independent of the cluster size.
> That may or may not say anything about the driver distributing application
> files vs. having init containers do it in parallel, but I’d be curious to
> hear more.
> Nick
> ​
> On Tue, Jan 9, 2018 at 9:08 PM Anirudh Ramanathan <>
> wrote:
> We were running a change in our fork which was similar to this at one
> point early on. My biggest concerns off the top of my head with this change
> would be localization performance with large numbers of executors, and what
> we lose in terms of separation of concerns. Init containers are a standard
> construct in k8s for resource localization. Also how this approach affects
> the HDFS work would be interesting.
> +matt +kimoon
> Still thinking about the potential trade offs here. Adding Matt and Kimoon
> who would remember more about our reasoning at the time.
> On Jan 9, 2018 5:22 PM, "Marcelo Vanzin" <> wrote:
> Hello,
> Me again. I was playing some more with the kubernetes backend and the
> whole init container thing seemed unnecessary to me.
> Currently it's used to download remote jars and files, mount the
> volume into the driver / executor, and place those jars in the
> classpath / move the files to the working directory. This is all stuff
> that spark-submit already does without needing extra help.
> So I spent some time hacking stuff and removing the init container
> code, and launching the driver inside kubernetes using spark-submit
> (similar to how standalone and mesos cluster mode works):
> <>
> I'd like to point out the output of "git show --stat" for that diff:
>  29 files changed, 130 insertions(+), 1560 deletions(-)
> You get massive code reuse by simply using spark-submit. The remote
> dependencies are downloaded in the driver, and the driver does the job
> of service them to executors.
> So I guess my question is: is there any advantage in using an init
> container?
> The current init container code can download stuff in parallel, but
> that's an easy improvement to make in spark-submit and that would
> benefit everybody. You can argue that executors downloading from
> external servers would be faster than downloading from the driver, but
> I'm not sure I'd agree - it can go both ways.
> Also the same idea could probably be applied to starting executors;
> Mesos starts executors using "spark-class" already, so doing that
> would both improve code sharing and potentially simplify some code in
> the k8s backend.
> --
> Marcelo
> ---------------------------------------------------------------------
> To unsubscribe e-mail:

Anirudh Ramanathan

View raw message