spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kimoon Kim <>
Subject Re: Toward an "API" for spark images used by the Kubernetes back-end
Date Wed, 28 Mar 2018 19:44:20 GMT
Thanks for starting this discussion.

When I was troubleshooting Spark on K8s, I often faced a need to turn on
debug messages on the driver and executor pods of my jobs, which would be
possible if I somehow put the right file inside the pods.
I know I can build custom Docker images, but that seems like too much. (So
being lazy, I usually just gave up)

If there is an alternative mechanism, like using a ConfigMap, I would
prefer that for this log4j need. Maybe we should document what are possible
alternatives to building Docker images for certain use cases and guide
people toward the right mechanisms?



On Wed, Mar 21, 2018 at 10:54 PM, Felix Cheung <>

> I like being able to customize the docker image itself - but I realize
> this thread is more about “API” for the stock image.
> Environment is nice. Probably we need a way to set custom spark config (as
> a file??)
> ------------------------------
> *From:* Holden Karau <>
> *Sent:* Wednesday, March 21, 2018 10:44:20 PM
> *To:* Erik Erlandson
> *Cc:* dev
> *Subject:* Re: Toward an "API" for spark images used by the Kubernetes
> back-end
> I’m glad this discussion is happening on dev@ :)
> Personally I like customizing with shell env variables during rolling my
> own image, but definitely documentation the expectations/usage of the
> variables is needed before we can really call it an API.
> On the related question I suspect two of the more “common” likely
> customizations is adding additional jars for bootstrapping fetching from a
> DFS & also similarity complicated Python dependencies (although given the
> Pythons support isn’t merged yet it’s hard to say what exactly this would
> look like).
> I could also see some vendors wanting to add some bootstrap/setup scripts
> to fetch keys or other things.
> What other ways do folks foresee customizing their Spark docker
> containers?
> On Wed, Mar 21, 2018 at 5:04 PM Erik Erlandson <>
> wrote:
>> During the review of the recent PR to remove use of the init_container
>> from kube pods as created by the Kubernetes back-end, the topic of
>> documenting the "API" for these container images also came up. What
>> information does the back-end provide to these containers? In what form?
>> What assumptions does the back-end make about the structure of these
>> containers?  This information is important in a scenario where a user wants
>> to create custom images, particularly if these are not based on the
>> reference dockerfiles.
>> A related topic is deciding what such an API should look like.  For
>> example, early incarnations were based more purely on environment
>> variables, which could have advantages in terms of an API that is easy to
>> describe in a document.  If we document the current API, should we annotate
>> it as Experimental?  If not, does that effectively freeze the API?
>> We are interested in community input about possible customization use
>> cases and opinions on possible API designs!
>> Cheers,
>> Erik
> --
> Twitter:

View raw message