spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Hamstra <m...@clearstorydata.com>
Subject Re: Publishing official docker images for KubernetesSchedulerBackend
Date Fri, 15 Dec 2017 02:19:29 GMT
What licensing issues come into play?

On Thu, Dec 14, 2017 at 4:00 PM, Erik Erlandson <eerlands@redhat.com> wrote:

> We've been discussing the topic of container images a bit more.  The
> kubernetes back-end operates by executing some specific CMD and ENTRYPOINT
> logic, which is different than mesos, and which is probably not practical
> to unify at this level.
>
> However: These CMD and ENTRYPOINT configurations are essentially just a
> thin skin on top of an image which is just an install of a spark distro.
> We feel that a single "spark-base" image should be publishable, that is
> consumable by kube-spark images, and mesos-spark images, and likely any
> other community image whose primary purpose is running spark components.
> The kube-specific dockerfiles would be written "FROM spark-base" and just
> add the small command and entrypoint layers.  Likewise, the mesos images
> could add any specialization layers that are necessary on top of the
> "spark-base" image.
>
> Does this factorization sound reasonable to others?
> Cheers,
> Erik
>
>
> On Wed, Nov 29, 2017 at 10:04 AM, Mridul Muralidharan <mridul@gmail.com>
> wrote:
>
>> We do support running on Apache Mesos via docker images - so this
>> would not be restricted to k8s.
>> But unlike mesos support, which has other modes of running, I believe
>> k8s support more heavily depends on availability of docker images.
>>
>>
>> Regards,
>> Mridul
>>
>>
>> On Wed, Nov 29, 2017 at 8:56 AM, Sean Owen <sowen@cloudera.com> wrote:
>> > Would it be logical to provide Docker-based distributions of other
>> pieces of
>> > Spark? or is this specific to K8S?
>> > The problem is we wouldn't generally also provide a distribution of
>> Spark
>> > for the reasons you give, because if that, then why not RPMs and so on.
>> >
>> > On Wed, Nov 29, 2017 at 10:41 AM Anirudh Ramanathan <
>> ramanathana@google.com>
>> > wrote:
>> >>
>> >> In this context, I think the docker images are similar to the binaries
>> >> rather than an extension.
>> >> It's packaging the compiled distribution to save people the effort of
>> >> building one themselves, akin to binaries or the python package.
>> >>
>> >> For reference, this is the base dockerfile for the main image that we
>> >> intend to publish. It's not particularly complicated.
>> >> The driver and executor images are based on said base image and only
>> >> customize the CMD (any file/directory inclusions are extraneous and
>> will be
>> >> removed).
>> >>
>> >> Is there only one way to build it? That's a bit harder to reason about.
>> >> The base image I'd argue is likely going to always be built that way.
>> The
>> >> driver and executor images, there may be cases where people want to
>> >> customize it - (like putting all dependencies into it for example).
>> >> In those cases, as long as our images are bare bones, they can use the
>> >> spark-driver/spark-executor images we publish as the base, and build
>> their
>> >> customization as a layer on top of it.
>> >>
>> >> I think the composability of docker images, makes this a bit different
>> >> from say - debian packages.
>> >> We can publish canonical images that serve as both - a complete image
>> for
>> >> most Spark applications, as well as a stable substrate to build
>> >> customization upon.
>> >>
>> >> On Wed, Nov 29, 2017 at 7:38 AM, Mark Hamstra <mark@clearstorydata.com
>> >
>> >> wrote:
>> >>>
>> >>> It's probably also worth considering whether there is only one,
>> >>> well-defined, correct way to create such an image or whether this is
a
>> >>> reasonable avenue for customization. Part of why we don't do
>> something like
>> >>> maintain and publish canonical Debian packages for Spark is because
>> >>> different organizations doing packaging and distribution of
>> infrastructures
>> >>> or operating systems can reasonably want to do this in a custom (or
>> >>> non-customary) way. If there is really only one reasonable way to do
a
>> >>> docker image, then my bias starts to tend more toward the Spark PMC
>> taking
>> >>> on the responsibility to maintain and publish that image. If there is
>> more
>> >>> than one way to do it and publishing a particular image is more just
a
>> >>> convenience, then my bias tends more away from maintaining and
>> publish it.
>> >>>
>> >>> On Wed, Nov 29, 2017 at 5:14 AM, Sean Owen <sowen@cloudera.com>
>> wrote:
>> >>>>
>> >>>> Source code is the primary release; compiled binary releases are
>> >>>> conveniences that are also released. A docker image sounds fairly
>> different
>> >>>> though. To the extent it's the standard delivery mechanism for some
>> artifact
>> >>>> (think: pyspark on PyPI as well) that makes sense, but is that the
>> >>>> situation? if it's more of an extension or alternate presentation
of
>> Spark
>> >>>> components, that typically wouldn't be part of a Spark release.
The
>> ones the
>> >>>> PMC takes responsibility for maintaining ought to be the core,
>> critical
>> >>>> means of distribution alone.
>> >>>>
>> >>>> On Wed, Nov 29, 2017 at 2:52 AM Anirudh Ramanathan
>> >>>> <ramanathana@google.com.invalid> wrote:
>> >>>>>
>> >>>>> Hi all,
>> >>>>>
>> >>>>> We're all working towards the Kubernetes scheduler backend (full
>> steam
>> >>>>> ahead!) that's targeted towards Spark 2.3. One of the questions
>> that comes
>> >>>>> up often is docker images.
>> >>>>>
>> >>>>> While we're making available dockerfiles to allow people to
create
>> >>>>> their own docker images from source, ideally, we'd want to publish
>> official
>> >>>>> docker images as part of the release process.
>> >>>>>
>> >>>>> I understand that the ASF has procedure around this, and we
would
>> want
>> >>>>> to get that started to help us get these artifacts published
by
>> 2.3. I'd
>> >>>>> love to get a discussion around this started, and the thoughts
of
>> the
>> >>>>> community regarding this.
>> >>>>>
>> >>>>> --
>> >>>>> Thanks,
>> >>>>> Anirudh Ramanathan
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Anirudh Ramanathan
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>>
>

Mime
View raw message