spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcelo Vanzin <>
Subject Re: Kubernetes: why use init containers?
Date Wed, 10 Jan 2018 16:40:18 GMT
On a side note, while it's great that you guys have meetings to
discuss things related to the project, it's general Apache practice to
discuss these things in the mailing list - or at the very list send
detailed info about what discussed in these meetings to the mailing
list. Not everybody can attend these meetings, and I'm not just
talking about people being busy, but there are people who live in
different time zones.

Now that this code is moving into Spark I'd recommend getting people
more involved with the Spark project to move things forward.

On Tue, Jan 9, 2018 at 8:23 PM, Anirudh Ramanathan
<> wrote:
> Marcelo, I can see that we might be misunderstanding what this change
> implies for performance and some of the deeper implementation details here.
> We have a community meeting tomorrow (at 10am PT), and we'll be sure to
> explore this idea in detail, and understand the implications and then get
> back to you.
> Thanks for the detailed responses here, and for spending time with the idea.
> (Also, you're more than welcome to attend the meeting - there's a link here
> if you're around.)
> Cheers,
> Anirudh
> On Jan 9, 2018 8:05 PM, "Marcelo Vanzin" <> wrote:
> One thing I forgot in my previous e-mail is that if a resource is
> remote I'm pretty sure (but haven't double checked the code) that
> executors will download it directly from the remote server, and not
> from the driver. So there, distributed download without an init
> container.
> On Tue, Jan 9, 2018 at 7:15 PM, Yinan Li <> wrote:
>> The init-container is required for use with the resource staging server
>> (
> If the staging server *requires* an init container you have already a
> design problem right there.
>> Additionally, the init-container is a Kubernetes
>> native way of making sure that the dependencies are localized
> Sorry, but the init container does not do anything by itself. You had
> to add a whole bunch of code to execute the existing Spark code in an
> init container, when not doing it would have achieved the exact same
> goal much more easily, in a way that is consistent with how Spark
> already does things.
> Matt:
>> the executors wouldn’t receive the jars on their class loader until after
>> the executor starts
> I actually consider that a benefit. It means spark-on-k8s application
> will behave more like all the other backends, where that is true also
> (application jars live in a separate class loader).
>> traditionally meant to prepare the environment for the application that is
>> to be run
> You guys are forcing this argument when it all depends on where you
> draw the line. Spark can be launched without downloading any of those
> dependencies, because Spark will download them for you. Forcing the
> "kubernetes way" just means you're writing a lot more code, and
> breaking the Spark app initialization into multiple container
> invocations, to achieve the same thing.
>> would make the SparkSubmit code inadvertently allow running client mode
>> Kubernetes applications as well
> Not necessarily. I have that in my patch; it doesn't allow client mode
> unless a property that only the cluster mode submission code sets is
> present. If some user wants to hack their way around that, more power
> to them; users can also compile their own Spark without the checks if
> they want to try out client mode in some way.
> Anirudh:
>> Telling users that they must rebuild images  ... every time seems less
>> than convincing to me.
> Sure, I'm not proposing people use the docker image approach all the
> time. It would be a hassle while developing an app, as it is kind of a
> hassle today where the code doesn't upload local files to the k8s
> cluster.
> But it's perfectly reasonable for people to optimize a production app
> by bundling the app into a pre-built docker image to avoid
> re-downloading resources every time. Like they'd probably place the
> jar + dependencies on HDFS today with YARN, to get the benefits of the
> YARN cache.
> --
> Marcelo
> ---------------------------------------------------------------------
> To unsubscribe e-mail:


To unsubscribe e-mail:

View raw message