spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anirudh Ramanathan <ramanath...@google.com.INVALID>
Subject Re: Kubernetes: why use init containers?
Date Wed, 10 Jan 2018 02:07:56 GMT
We were running a change in our fork which was similar to this at one point
early on. My biggest concerns off the top of my head with this change would
be localization performance with large numbers of executors, and what we
lose in terms of separation of concerns. Init containers are a standard
construct in k8s for resource localization. Also how this approach affects
the HDFS work would be interesting.

+matt +kimoon
Still thinking about the potential trade offs here. Adding Matt and Kimoon
who would remember more about our reasoning at the time.


On Jan 9, 2018 5:22 PM, "Marcelo Vanzin" <vanzin@cloudera.com> wrote:

> Hello,
>
> Me again. I was playing some more with the kubernetes backend and the
> whole init container thing seemed unnecessary to me.
>
> Currently it's used to download remote jars and files, mount the
> volume into the driver / executor, and place those jars in the
> classpath / move the files to the working directory. This is all stuff
> that spark-submit already does without needing extra help.
>
> So I spent some time hacking stuff and removing the init container
> code, and launching the driver inside kubernetes using spark-submit
> (similar to how standalone and mesos cluster mode works):
>
> https://github.com/vanzin/spark/commit/k8s-no-init
>
> I'd like to point out the output of "git show --stat" for that diff:
>  29 files changed, 130 insertions(+), 1560 deletions(-)
>
> You get massive code reuse by simply using spark-submit. The remote
> dependencies are downloaded in the driver, and the driver does the job
> of service them to executors.
>
> So I guess my question is: is there any advantage in using an init
> container?
>
> The current init container code can download stuff in parallel, but
> that's an easy improvement to make in spark-submit and that would
> benefit everybody. You can argue that executors downloading from
> external servers would be faster than downloading from the driver, but
> I'm not sure I'd agree - it can go both ways.
>
> Also the same idea could probably be applied to starting executors;
> Mesos starts executors using "spark-class" already, so doing that
> would both improve code sharing and potentially simplify some code in
> the k8s backend.
>
> --
> Marcelo
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Mime
View raw message