spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Erlandson <eerla...@redhat.com>
Subject Re: Publishing official docker images for KubernetesSchedulerBackend
Date Tue, 19 Dec 2017 18:34:21 GMT
I've been looking a bit more into ASF legal posture on licensing and
container images. What I have found indicates that ASF considers container
images to be just another variety of distribution channel.  As such, it is
acceptable to publish official releases; for example an image such as
spark:v2.3.0 built from the v2.3.0 source is fine.  It is not acceptable to
do something like regularly publish spark:latest built from the head of
master.

More detail here:
https://issues.apache.org/jira/browse/LEGAL-270

So as I understand it, making a release-tagged public image as part of each
official release does not pose any problems.

With respect to considering the licenses of other ancillary dependencies
that are also installed on such container images, I noticed this clause in
the legal boilerplate for the Flink images
<https://hub.docker.com/r/library/flink/>:

As with all Docker images, these likely also contain other software which
> may be under other licenses (such as Bash, etc from the base distribution,
> along with any direct or indirect dependencies of the primary software
> being contained).
>

So it may be sufficient to resolve this via disclaimer.

-Erik

On Thu, Dec 14, 2017 at 7:55 PM, Erik Erlandson <eerlands@redhat.com> wrote:

> Currently the containers are based off alpine, which pulls in BSD2 and MIT
> licensing:
> https://github.com/apache/spark/pull/19717#discussion_r154502824
>
> to the best of my understanding, neither of those poses a problem.  If we
> based the image off of centos I'd also expect the licensing of any image
> deps to be compatible.
>
> On Thu, Dec 14, 2017 at 7:19 PM, Mark Hamstra <mark@clearstorydata.com>
> wrote:
>
>> What licensing issues come into play?
>>
>> On Thu, Dec 14, 2017 at 4:00 PM, Erik Erlandson <eerlands@redhat.com>
>> wrote:
>>
>>> We've been discussing the topic of container images a bit more.  The
>>> kubernetes back-end operates by executing some specific CMD and ENTRYPOINT
>>> logic, which is different than mesos, and which is probably not practical
>>> to unify at this level.
>>>
>>> However: These CMD and ENTRYPOINT configurations are essentially just a
>>> thin skin on top of an image which is just an install of a spark distro.
>>> We feel that a single "spark-base" image should be publishable, that is
>>> consumable by kube-spark images, and mesos-spark images, and likely any
>>> other community image whose primary purpose is running spark components.
>>> The kube-specific dockerfiles would be written "FROM spark-base" and just
>>> add the small command and entrypoint layers.  Likewise, the mesos images
>>> could add any specialization layers that are necessary on top of the
>>> "spark-base" image.
>>>
>>> Does this factorization sound reasonable to others?
>>> Cheers,
>>> Erik
>>>
>>>
>>> On Wed, Nov 29, 2017 at 10:04 AM, Mridul Muralidharan <mridul@gmail.com>
>>> wrote:
>>>
>>>> We do support running on Apache Mesos via docker images - so this
>>>> would not be restricted to k8s.
>>>> But unlike mesos support, which has other modes of running, I believe
>>>> k8s support more heavily depends on availability of docker images.
>>>>
>>>>
>>>> Regards,
>>>> Mridul
>>>>
>>>>
>>>> On Wed, Nov 29, 2017 at 8:56 AM, Sean Owen <sowen@cloudera.com> wrote:
>>>> > Would it be logical to provide Docker-based distributions of other
>>>> pieces of
>>>> > Spark? or is this specific to K8S?
>>>> > The problem is we wouldn't generally also provide a distribution of
>>>> Spark
>>>> > for the reasons you give, because if that, then why not RPMs and so
>>>> on.
>>>> >
>>>> > On Wed, Nov 29, 2017 at 10:41 AM Anirudh Ramanathan <
>>>> ramanathana@google.com>
>>>> > wrote:
>>>> >>
>>>> >> In this context, I think the docker images are similar to the
>>>> binaries
>>>> >> rather than an extension.
>>>> >> It's packaging the compiled distribution to save people the effort
of
>>>> >> building one themselves, akin to binaries or the python package.
>>>> >>
>>>> >> For reference, this is the base dockerfile for the main image that
we
>>>> >> intend to publish. It's not particularly complicated.
>>>> >> The driver and executor images are based on said base image and
only
>>>> >> customize the CMD (any file/directory inclusions are extraneous
and
>>>> will be
>>>> >> removed).
>>>> >>
>>>> >> Is there only one way to build it? That's a bit harder to reason
>>>> about.
>>>> >> The base image I'd argue is likely going to always be built that
>>>> way. The
>>>> >> driver and executor images, there may be cases where people want
to
>>>> >> customize it - (like putting all dependencies into it for example).
>>>> >> In those cases, as long as our images are bare bones, they can use
>>>> the
>>>> >> spark-driver/spark-executor images we publish as the base, and build
>>>> their
>>>> >> customization as a layer on top of it.
>>>> >>
>>>> >> I think the composability of docker images, makes this a bit
>>>> different
>>>> >> from say - debian packages.
>>>> >> We can publish canonical images that serve as both - a complete
>>>> image for
>>>> >> most Spark applications, as well as a stable substrate to build
>>>> >> customization upon.
>>>> >>
>>>> >> On Wed, Nov 29, 2017 at 7:38 AM, Mark Hamstra <
>>>> mark@clearstorydata.com>
>>>> >> wrote:
>>>> >>>
>>>> >>> It's probably also worth considering whether there is only one,
>>>> >>> well-defined, correct way to create such an image or whether
this
>>>> is a
>>>> >>> reasonable avenue for customization. Part of why we don't do
>>>> something like
>>>> >>> maintain and publish canonical Debian packages for Spark is
because
>>>> >>> different organizations doing packaging and distribution of
>>>> infrastructures
>>>> >>> or operating systems can reasonably want to do this in a custom
(or
>>>> >>> non-customary) way. If there is really only one reasonable way
to
>>>> do a
>>>> >>> docker image, then my bias starts to tend more toward the Spark
PMC
>>>> taking
>>>> >>> on the responsibility to maintain and publish that image. If
there
>>>> is more
>>>> >>> than one way to do it and publishing a particular image is more
>>>> just a
>>>> >>> convenience, then my bias tends more away from maintaining and
>>>> publish it.
>>>> >>>
>>>> >>> On Wed, Nov 29, 2017 at 5:14 AM, Sean Owen <sowen@cloudera.com>
>>>> wrote:
>>>> >>>>
>>>> >>>> Source code is the primary release; compiled binary releases
are
>>>> >>>> conveniences that are also released. A docker image sounds
fairly
>>>> different
>>>> >>>> though. To the extent it's the standard delivery mechanism
for
>>>> some artifact
>>>> >>>> (think: pyspark on PyPI as well) that makes sense, but is
that the
>>>> >>>> situation? if it's more of an extension or alternate presentation
>>>> of Spark
>>>> >>>> components, that typically wouldn't be part of a Spark release.
>>>> The ones the
>>>> >>>> PMC takes responsibility for maintaining ought to be the
core,
>>>> critical
>>>> >>>> means of distribution alone.
>>>> >>>>
>>>> >>>> On Wed, Nov 29, 2017 at 2:52 AM Anirudh Ramanathan
>>>> >>>> <ramanathana@google.com.invalid> wrote:
>>>> >>>>>
>>>> >>>>> Hi all,
>>>> >>>>>
>>>> >>>>> We're all working towards the Kubernetes scheduler backend
(full
>>>> steam
>>>> >>>>> ahead!) that's targeted towards Spark 2.3. One of the
questions
>>>> that comes
>>>> >>>>> up often is docker images.
>>>> >>>>>
>>>> >>>>> While we're making available dockerfiles to allow people
to create
>>>> >>>>> their own docker images from source, ideally, we'd want
to
>>>> publish official
>>>> >>>>> docker images as part of the release process.
>>>> >>>>>
>>>> >>>>> I understand that the ASF has procedure around this,
and we would
>>>> want
>>>> >>>>> to get that started to help us get these artifacts published
by
>>>> 2.3. I'd
>>>> >>>>> love to get a discussion around this started, and the
thoughts of
>>>> the
>>>> >>>>> community regarding this.
>>>> >>>>>
>>>> >>>>> --
>>>> >>>>> Thanks,
>>>> >>>>> Anirudh Ramanathan
>>>> >>>
>>>> >>>
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Anirudh Ramanathan
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>
>>>>
>>>
>>
>

Mime
View raw message