spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luciano Resende <>
Subject Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends
Date Sun, 17 Apr 2016 05:42:36 GMT
On Sat, Apr 16, 2016 at 5:38 PM, Evan Chan <> wrote:

> Hi folks,
> Sorry to join the discussion late.  I had a look at the design doc
> earlier in this thread, and it was not mentioned what types of
> projects are the targets of this new "spark extras" ASF umbrella....
> Is the desire to have a maintained set of spark-related projects that
> keep pace with the main Spark development schedule?  Is it just for
> streaming connectors?  what about data sources, and other important
> projects in the Spark ecosystem?

The proposal draft below has some more details on what type of projects,
but in summary, "Spark-Extras" would be a good place for any of these
components you mentioned.

> I'm worried that this would relegate spark-packages to third tier
> status,

Owen answered a similar question about spark-packages earlier on this
thread, but while "Spark-Extras" would a place in Apache for collaboration
on the development of these extensions, they might still be published to
spark-packages as they existing streaming connectors are today.

> and the promotion of a select set of committers, and the
> project itself, to top level ASF status (a la Arrow) would create a
> further split in the community.
As for the select set of committers, we have invited all Spark committers
to be committers on the project, and I have updated the project proposal
with the existing set of active Spark committers ( that have committed in
the last one year)

> -Evan
> On Sat, Apr 16, 2016 at 4:46 AM, Steve Loughran <>
> wrote:
> >
> >
> >
> >
> >
> > On 15/04/2016, 17:41, "Mattmann, Chris A (3980)" <
>> wrote:
> >
> >>Yeah in support of this statement I think that my primary interest in
> >>this Spark Extras and the good work by Luciano here is that anytime we
> >>take bits out of a code base and “move it to GitHub” I see a bad
> precedent
> >>being set.
> >>
> >>Creating this project at the ASF creates a synergy between *Apache Spark*
> >>which is *at the ASF*.
> >>
> >>We welcome comments and as Luciano said, this is meant to invite and be
> >>open to those in the Apache Spark PMC to join and help.
> >>
> >>Cheers,
> >>Chris
> >
> > As one of the people named, here's my rationale:
> >
> > Throwing stuff into github creates that world of branches, and its no
> longer something that could be managed through the ASF, where managed is:
> governance, participation and a release process that includes auditing
> dependencies, code-signoff, etc,
> >
> >
> > As an example, there's a mutant hive JAR which spark uses, that's
> something which currently evolved between my repo and Patrick Wendell's;
> now that Josh Rosen has taken on the bold task of "trying to move spark and
> twill to Kryo 3", he's going to own that code, and now the reference branch
> will move somewhere else.
> >
> > In contrast, if there was an ASF location for this, then it'd be
> something anyone with commit rights could maintain and publish
> >
> > (actually, I've just realised life is hard here as the hive is a fork of
> ASF hive —really the spark branch should be a separate branch in Hive's own
> repo ... But the concept is the same: those bits of the codebase which are
> core parts of the spark project should really live in or near it)
> >
> >
> > If everyone on the spark commit list gets write access to this extras
> repo, moving things is straightforward. Release wise, things could/should
> be in sync.
> >
> > If there's a risk, its the eternal problem of the contrib/ dir ....
> Stuff ends up there that never gets maintained. I don't see that being any
> worse than if things were thrown to the wind of a thousand github repos: at
> least now there'd be a central issue tracking location.

Luciano Resende

View raw message