and how does this all relate to the existing 1-and-a-half-class citizen known as

support for this citizen is buried deep in the Spark source (which was always a bit odd, in my opinion):

On Fri, Apr 15, 2016 at 12:18 PM, Sean Owen <> wrote:
Why would this need to be an ASF project of its own? I don't think
it's possible to have a yet another separate "Spark Extras" TLP (?)

There is already a project to manage these bits of code on Github. How
about all of the interested parties manage the code there, under the
same process, under the same license, etc?

I'm not against calling it Spark Extras myself but I wonder if that
needlessly confuses the situation. They aren't part of the Spark TLP
on purpose, so trying to give it some special middle-ground status
might just be confusing. The thing that comes to mind immediately is
"Connectors for Apache Spark", spark-connectors, etc.

On Fri, Apr 15, 2016 at 5:01 PM, Luciano Resende <> wrote:
> After some collaboration with other community members, we have created a
> initial draft for Spark Extras which is available for review at
> We would like to invite other community members to participate in the
> project, particularly the Spark Committers and PMC (feel free to express
> interest and I will update the proposal). Another option here is just to
> give ALL Spark committers write access to "Spark Extras".
> We also have couple asks from the Spark PMC :
> - Permission to use "Spark Extras" as the project name. We already checked
> this with Apache Brand Management, and the recommendation was to discuss and
> reach consensus with the Spark PMC.
> - We would also want to check with the Spark PMC that, in case of
> successfully creation of  "Spark Extras", if the PMC would be willing to
> continue the development of the remaining connectors that stayed in Spark
> 2.0 codebase in the "Spark Extras" project.
> Thanks in advance, and we welcome any feedback around this proposal before
> we present to the Apache Board for consideration.
> On Sat, Mar 26, 2016 at 10:07 AM, Luciano Resende <>
> wrote:
>> I believe some of this has been resolved in the context of some parts that
>> had interest in one extra connector, but we still have a few removed, and as
>> you mentioned, we still don't have a simple way or willingness to manage and
>> be current on new packages like kafka. And based on the fact that this
>> thread is still alive, I believe that other community members might have
>> other concerns as well.
>> After some thought, I believe having a separate project (what was
>> mentioned here as Spark Extras) to handle Spark Connectors and Spark add-ons
>> in general could be very beneficial to Spark and the overall Spark
>> community, which would have a central place in Apache to collaborate around
>> related Spark components.
>> Some of the benefits on this approach
>> - Enables maintaining the connectors inside Apache, following the Apache
>> governance and release rules, while allowing Spark proper to focus on the
>> core runtime.
>> - Provides more flexibility in controlling the direction (currency) of the
>> existing connectors (e.g. willing to find a solution and maintain multiple
>> versions of same connectors like kafka 0.8x and 0.9x)
>> - Becomes a home for other types of Spark related connectors helping
>> expanding the community around Spark (e.g. Zeppelin see most of it's current
>> contribution around new/enhanced connectors)
>> What are some requirements for Spark Extras to be successful:
>> - Be up to date with Spark Trunk APIs (based on daily CIs against
>> - Adhere to Spark release cycles (have a very little window compared to
>> Spark release)
>> - Be more open and flexible to the set of connectors it will accept and
>> maintain (e.g. also handle multiple versions like the kafka 0.9 issue we
>> have today)
>> Where to start Spark Extras
>> Depending on the interest here, we could follow the steps of (Apache
>> Arrow) and start this directly as a TLP, or start as an incubator project. I
>> would consider the first option first.
>> Who would participate
>> Have thought about this for a bit, and if we go to the direction of TLP, I
>> would say Spark Committers and Apache Members can request to participate as
>> PMC members, while other committers can request to become committers. Non
>> committers would be added based on meritocracy after the start of the
>> project.
>> Project Name
>> It would be ideal if we could have a project name that shows close ties to
>> Spark (e.g. Spark Extras or Spark Connectors) but we will need permission
>> and support from whoever is going to evaluate the project proposal (e.g.
>> Apache Board)
>> Thoughts ?
>> Does anyone have any big disagreement or objection to moving into this
>> direction ?
>> Otherwise, who would be interested in joining the project, so I can start
>> working on some concrete proposal ?
> --
> Luciano Resende

To unsubscribe, e-mail:
For additional commands, e-mail:


Chris Fregly
Principal Data Solutions Engineer
IBM Spark Technology Center, San Francisco, CA