spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: Publishing official docker images for KubernetesSchedulerBackend
Date Tue, 19 Dec 2017 18:45:47 GMT
I think that's all correct, though the license of third party dependencies
is actually a difficult and sticky part. The ASF couldn't make a software
release including any GPL software for example, and it's not just a matter
of adding a disclaimer. Any actual bits distributed by the PMC would have
to follow all the license rules.

On Tue, Dec 19, 2017 at 12:34 PM Erik Erlandson <eerlands@redhat.com> wrote:

> I've been looking a bit more into ASF legal posture on licensing and
> container images. What I have found indicates that ASF considers container
> images to be just another variety of distribution channel.  As such, it is
> acceptable to publish official releases; for example an image such as
> spark:v2.3.0 built from the v2.3.0 source is fine.  It is not acceptable to
> do something like regularly publish spark:latest built from the head of
> master.
>
> More detail here:
> https://issues.apache.org/jira/browse/LEGAL-270
>
> So as I understand it, making a release-tagged public image as part of
> each official release does not pose any problems.
>
> With respect to considering the licenses of other ancillary dependencies
> that are also installed on such container images, I noticed this clause in
> the legal boilerplate for the Flink images
> <https://hub.docker.com/r/library/flink/>:
>
> As with all Docker images, these likely also contain other software which
>> may be under other licenses (such as Bash, etc from the base distribution,
>> along with any direct or indirect dependencies of the primary software
>> being contained).
>>
>
> So it may be sufficient to resolve this via disclaimer.
>
> -Erik
>
> On Thu, Dec 14, 2017 at 7:55 PM, Erik Erlandson <eerlands@redhat.com>
> wrote:
>
>> Currently the containers are based off alpine, which pulls in BSD2 and
>> MIT licensing:
>> https://github.com/apache/spark/pull/19717#discussion_r154502824
>>
>> to the best of my understanding, neither of those poses a problem.  If we
>> based the image off of centos I'd also expect the licensing of any image
>> deps to be compatible.
>>
>> On Thu, Dec 14, 2017 at 7:19 PM, Mark Hamstra <mark@clearstorydata.com>
>> wrote:
>>
>>> What licensing issues come into play?
>>>
>>> On Thu, Dec 14, 2017 at 4:00 PM, Erik Erlandson <eerlands@redhat.com>
>>> wrote:
>>>
>>>> We've been discussing the topic of container images a bit more.  The
>>>> kubernetes back-end operates by executing some specific CMD and ENTRYPOINT
>>>> logic, which is different than mesos, and which is probably not practical
>>>> to unify at this level.
>>>>
>>>> However: These CMD and ENTRYPOINT configurations are essentially just a
>>>> thin skin on top of an image which is just an install of a spark distro.
>>>> We feel that a single "spark-base" image should be publishable, that is
>>>> consumable by kube-spark images, and mesos-spark images, and likely any
>>>> other community image whose primary purpose is running spark components.
>>>> The kube-specific dockerfiles would be written "FROM spark-base" and just
>>>> add the small command and entrypoint layers.  Likewise, the mesos images
>>>> could add any specialization layers that are necessary on top of the
>>>> "spark-base" image.
>>>>
>>>> Does this factorization sound reasonable to others?
>>>> Cheers,
>>>> Erik
>>>>
>>>>
>>>> On Wed, Nov 29, 2017 at 10:04 AM, Mridul Muralidharan <mridul@gmail.com
>>>> > wrote:
>>>>
>>>>> We do support running on Apache Mesos via docker images - so this
>>>>> would not be restricted to k8s.
>>>>> But unlike mesos support, which has other modes of running, I believe
>>>>> k8s support more heavily depends on availability of docker images.
>>>>>
>>>>>
>>>>> Regards,
>>>>> Mridul
>>>>>
>>>>>
>>>>> On Wed, Nov 29, 2017 at 8:56 AM, Sean Owen <sowen@cloudera.com>
wrote:
>>>>> > Would it be logical to provide Docker-based distributions of other
>>>>> pieces of
>>>>> > Spark? or is this specific to K8S?
>>>>> > The problem is we wouldn't generally also provide a distribution
of
>>>>> Spark
>>>>> > for the reasons you give, because if that, then why not RPMs and
so
>>>>> on.
>>>>> >
>>>>> > On Wed, Nov 29, 2017 at 10:41 AM Anirudh Ramanathan <
>>>>> ramanathana@google.com>
>>>>> > wrote:
>>>>> >>
>>>>> >> In this context, I think the docker images are similar to the
>>>>> binaries
>>>>> >> rather than an extension.
>>>>> >> It's packaging the compiled distribution to save people the
effort
>>>>> of
>>>>> >> building one themselves, akin to binaries or the python package.
>>>>> >>
>>>>> >> For reference, this is the base dockerfile for the main image
that
>>>>> we
>>>>> >> intend to publish. It's not particularly complicated.
>>>>> >> The driver and executor images are based on said base image
and only
>>>>> >> customize the CMD (any file/directory inclusions are extraneous
and
>>>>> will be
>>>>> >> removed).
>>>>> >>
>>>>> >> Is there only one way to build it? That's a bit harder to reason
>>>>> about.
>>>>> >> The base image I'd argue is likely going to always be built
that
>>>>> way. The
>>>>> >> driver and executor images, there may be cases where people
want to
>>>>> >> customize it - (like putting all dependencies into it for example).
>>>>> >> In those cases, as long as our images are bare bones, they can
use
>>>>> the
>>>>> >> spark-driver/spark-executor images we publish as the base, and
>>>>> build their
>>>>> >> customization as a layer on top of it.
>>>>> >>
>>>>> >> I think the composability of docker images, makes this a bit
>>>>> different
>>>>> >> from say - debian packages.
>>>>> >> We can publish canonical images that serve as both - a complete
>>>>> image for
>>>>> >> most Spark applications, as well as a stable substrate to build
>>>>> >> customization upon.
>>>>> >>
>>>>> >> On Wed, Nov 29, 2017 at 7:38 AM, Mark Hamstra <
>>>>> mark@clearstorydata.com>
>>>>> >> wrote:
>>>>> >>>
>>>>> >>> It's probably also worth considering whether there is only
one,
>>>>> >>> well-defined, correct way to create such an image or whether
this
>>>>> is a
>>>>> >>> reasonable avenue for customization. Part of why we don't
do
>>>>> something like
>>>>> >>> maintain and publish canonical Debian packages for Spark
is because
>>>>> >>> different organizations doing packaging and distribution
of
>>>>> infrastructures
>>>>> >>> or operating systems can reasonably want to do this in a
custom (or
>>>>> >>> non-customary) way. If there is really only one reasonable
way to
>>>>> do a
>>>>> >>> docker image, then my bias starts to tend more toward the
Spark
>>>>> PMC taking
>>>>> >>> on the responsibility to maintain and publish that image.
If there
>>>>> is more
>>>>> >>> than one way to do it and publishing a particular image
is more
>>>>> just a
>>>>> >>> convenience, then my bias tends more away from maintaining
and
>>>>> publish it.
>>>>> >>>
>>>>> >>> On Wed, Nov 29, 2017 at 5:14 AM, Sean Owen <sowen@cloudera.com>
>>>>> wrote:
>>>>> >>>>
>>>>> >>>> Source code is the primary release; compiled binary
releases are
>>>>> >>>> conveniences that are also released. A docker image
sounds fairly
>>>>> different
>>>>> >>>> though. To the extent it's the standard delivery mechanism
for
>>>>> some artifact
>>>>> >>>> (think: pyspark on PyPI as well) that makes sense, but
is that the
>>>>> >>>> situation? if it's more of an extension or alternate
presentation
>>>>> of Spark
>>>>> >>>> components, that typically wouldn't be part of a Spark
release.
>>>>> The ones the
>>>>> >>>> PMC takes responsibility for maintaining ought to be
the core,
>>>>> critical
>>>>> >>>> means of distribution alone.
>>>>> >>>>
>>>>> >>>> On Wed, Nov 29, 2017 at 2:52 AM Anirudh Ramanathan
>>>>> >>>> <ramanathana@google.com.invalid> wrote:
>>>>> >>>>>
>>>>> >>>>> Hi all,
>>>>> >>>>>
>>>>> >>>>> We're all working towards the Kubernetes scheduler
backend (full
>>>>> steam
>>>>> >>>>> ahead!) that's targeted towards Spark 2.3. One of
the questions
>>>>> that comes
>>>>> >>>>> up often is docker images.
>>>>> >>>>>
>>>>> >>>>> While we're making available dockerfiles to allow
people to
>>>>> create
>>>>> >>>>> their own docker images from source, ideally, we'd
want to
>>>>> publish official
>>>>> >>>>> docker images as part of the release process.
>>>>> >>>>>
>>>>> >>>>> I understand that the ASF has procedure around this,
and we
>>>>> would want
>>>>> >>>>> to get that started to help us get these artifacts
published by
>>>>> 2.3. I'd
>>>>> >>>>> love to get a discussion around this started, and
the thoughts
>>>>> of the
>>>>> >>>>> community regarding this.
>>>>> >>>>>
>>>>> >>>>> --
>>>>> >>>>> Thanks,
>>>>> >>>>> Anirudh Ramanathan
>>>>> >>>
>>>>> >>>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Anirudh Ramanathan
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message