spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Cheah <mch...@palantir.com>
Subject Re: Kubernetes backend and docker images
Date Mon, 08 Jan 2018 21:42:29 GMT
// Fixing Anirudh's email address

________________________________
From: Matt Cheah
Sent: Monday, January 8, 2018 1:39:12 PM
To: Anirudh Ramanathan; Felix Cheung
Cc: 蒋星博; Marcelo Vanzin; dev; Timothy Chen
Subject: Re: Kubernetes backend and docker images


We would still want images to be able to be uniquely specified for the driver vs. the executors.
For example, not all of the libraries required on the driver may be required on the executors,
so the user would want to specify a different custom driver image from their custom executor
image.



But the idea of the entry point script that can switch based on environment variables makes
sense.



I do think we want separate Python and R images, because Python and R come with non-trivial
extra baggage that can make the images a lot bigger and slower to download for Scala-only
users.



From: Anirudh Ramanathan <ramanathana@google.com.INVALID>
Date: Monday, January 8, 2018 at 9:48 AM
To: Felix Cheung <felixcheung_m@hotmail.com>
Cc: 蒋星博 <jiangxb1987@gmail.com>, Marcelo Vanzin <vanzin@cloudera.com>, dev
<dev@spark.apache.org>, Matt Cheah <mcheah@palantir.com>, Timothy Chen <tnachen@gmail.com>
Subject: Re: Kubernetes backend and docker images



+matt +tim

For reference - here's our previous thread on this dockerfile unification problem - https://github.com/apache-spark-on-k8s/spark/pull/60[github.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache-2Dspark-2Don-2Dk8s_spark_pull_60&d=DwMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=hzwIMNQ9E99EMYGuqHI0kXhVbvX3nU3OSDadUnJxjAs&m=p4Uw1HnAlReB9Az1dDlMHQHQnxXaWSTUkndFQhaTLrc&s=Q-Svbf-gRJmvuxWzSjjq5ZZZjJmoTaGkmPNaLQVKZzQ&e=>

I think this approach should be acceptable from both the customization and visibility perspectives.





On Mon, Jan 8, 2018 at 9:40 AM, Anirudh Ramanathan <ramanathana@google.com<mailto:ramanathana@google.com>>
wrote:

+1



We discussed some alternatives early on - including using a single dockerfile and different
spec.container.command and spec.container.args from the Kubernetes driver/executor specification
(which override entrypoint in docker). No reason that won't work also - except that it reduced
the transparency of what was being invoked in the driver/executor/init by hiding it in the
actual backend code.



Putting it into a single entrypoint file and branching let's us realize the best of both worlds
I think. This is an elegant solution, thanks Marcelo.



On Jan 6, 2018 10:01 AM, "Felix Cheung" <felixcheung_m@hotmail.com<mailto:felixcheung_m@hotmail.com>>
wrote:

+1



Thanks for taking on this.

That was my feedback on one of the long comment thread as well, I think we should have one
docker image instead of 3 (also pending in the fork are python and R variant, we should consider
having one that we official release instead of 9, for example)





________________________________

From: 蒋星博 <jiangxb1987@gmail.com<mailto:jiangxb1987@gmail.com>>
Sent: Friday, January 5, 2018 10:57:53 PM
To: Marcelo Vanzin
Cc: dev
Subject: Re: Kubernetes backend and docker images



Agree it should be nice to have this simplification, and users can still create their custom
images by copy/modifying the default one.

Thanks for bring this out Marcelo!



2018-01-05 17:06 GMT-08:00 Marcelo Vanzin <vanzin@cloudera.com<mailto:vanzin@cloudera.com>>:

Hey all, especially those working on the k8s stuff.

Currently we have 3 docker images that need to be built and provided
by the user when starting a Spark app: driver, executor, and init
container.

When the initial review went by, I asked why do we need 3, and I was
told that's because they have different entry points. That never
really convinced me, but well, everybody wanted to get things in to
get the ball rolling.

But I still think that's not the best way to go. I did some pretty
simple hacking and got things to work with a single image:

https://github.com/vanzin/spark/commit/k8s-img[github.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_vanzin_spark_commit_k8s-2Dimg&d=DwMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=hzwIMNQ9E99EMYGuqHI0kXhVbvX3nU3OSDadUnJxjAs&m=p4Uw1HnAlReB9Az1dDlMHQHQnxXaWSTUkndFQhaTLrc&s=I6UykB4OI_29gnvRoaKahiOi3jaSF-LEkLJ37EcrCp8&e=>

Is there a reason why that approach would not work? You could still
create separate images for driver and executor if wanted, but there's
no reason I can see why we should need 3 images for the simple case.

Note that the code there can be cleaned up still, and I don't love the
idea of using env variables to propagate arguments to the container,
but that works for now.

--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org<mailto:dev-unsubscribe@spark.apache.org>







--

Anirudh Ramanathan

Mime
View raw message