spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <>
Subject Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends
Date Sat, 16 Apr 2016 11:46:54 GMT

On 15/04/2016, 17:41, "Mattmann, Chris A (3980)" <> wrote:

>Yeah in support of this statement I think that my primary interest in
>this Spark Extras and the good work by Luciano here is that anytime we
>take bits out of a code base and “move it to GitHub” I see a bad precedent
>being set.
>Creating this project at the ASF creates a synergy between *Apache Spark*
>which is *at the ASF*.
>We welcome comments and as Luciano said, this is meant to invite and be
>open to those in the Apache Spark PMC to join and help.

As one of the people named, here's my rationale:

Throwing stuff into github creates that world of branches, and its no longer something that
could be managed through the ASF, where managed is: governance, participation and a release
process that includes auditing dependencies, code-signoff, etc,

As an example, there's a mutant hive JAR which spark uses, that's something which currently
evolved between my repo and Patrick Wendell's; now that Josh Rosen has taken on the bold task
of "trying to move spark and twill to Kryo 3", he's going to own that code, and now the reference
branch will move somewhere else.

In contrast, if there was an ASF location for this, then it'd be something anyone with commit
rights could maintain and publish

(actually, I've just realised life is hard here as the hive is a fork of ASF hive —really
the spark branch should be a separate branch in Hive's own repo ... But the concept is the
same: those bits of the codebase which are core parts of the spark project should really live
in or near it)

If everyone on the spark commit list gets write access to this extras repo, moving things
is straightforward. Release wise, things could/should be in sync.

If there's a risk, its the eternal problem of the contrib/ dir .... Stuff ends up there that
never gets maintained. I don't see that being any worse than if things were thrown to the
wind of a thousand github repos: at least now there'd be a central issue tracking location.
View raw message