spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (398J)" <>
Subject Re: Licensing for PySpark's CloudPickle Module
Date Mon, 29 Jul 2013 05:21:04 GMT
Hi Josh,

BSD is a Category-A approved license at the ASF:

Meaning it can be incorporated in Apache projects.



Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA

-----Original Message-----
From: Josh Rosen <>
Reply-To: "" <>
Date: Sunday, July 28, 2013 9:46 PM
To: "" <>
Subject: Licensing for PySpark's CloudPickle Module

>PySpark's CloudPickle library was originally developed by PiCloud (
> and distributed under a non-BSD license.  I
>contacted them last year and they agreed to let us bundle the CloudPickle
>module under a BSD license.  Now that Spark is moving to an Apache
>how does this impact this module?  What license will apply to future
>changes to this module?  Do we need to obtain additional licensing from
>PiCloud folks?  I've attached my original correspondence with PiCloud, in
>case that helps.
>I ask because I'm interested in making some fixes to the cloudpickle code
>and I'd like to collaborate with the PiCloud folks, if possible, since
>they're more familiar with that code and may be interested in some of the
>bugs that I've found.
>Josh Rosen
>---------- Forwarded message ----------
>From: Josh Rosen <>
>Date: Wed, Aug 15, 2012 at 11:47 PM
>Subject: Re: Request to release the CloudPickler module as its own Python
>To: Aaron Staley <>
>Cc:, Matei Zaharia <>
>Hi Aaron,
>I'm just interested in,, and their small
>dependencies.  We'll develop our own module transfer / dependency
>deployment system or build on existing systems in Spark or Mesos, so I
>don't need to use other code from PiCloud.
>The CloudPickle module has been very useful and I appreciate your help
>the licensing.  I'll bundle and its dependencies with
>PySpark and add the proper attribution in the docstring.
>Thanks for your help,
>Josh Rosen
>On Aug 11, 2012, at 12:23 AM, Aaron Staley wrote:
>Hi Josh,
>How much of the functionality do you need to utilize?
>If we are just talking and (and their small
>dependencies; 2 functions the cloudpickler uses from util and the
>xmlhandlers library used by pickledebug), we are fine with you moving that
>into your own code and re-releasing it under the BSD license. Just modify
>the license the license in the source code; all we ask is that you
>attribute the original work to PiCloud, Inc. and provide a link to our
>website in the top level comments of the modules.
>If you are looking for all of the functionality relating to getting code
>running on X machine to Y machine (module transfer, some of the import
>hacks in adapter, etc.), that's a whole different matter. It's difficult
>pull it out of PiCloud itself as a separate package, due to it being
>across so many modules.  Are just the picklers enough?
>Aaron Staley
>PiCloud, Inc.
>On Thu, Aug 9, 2012 at 10:58 PM, Josh Rosen <>
>> Hello,
>> My name is Josh Rosen.  I'm a grad student at UC Berkeley and I'm
>> on implementing a Python API for the Spark cluster computing system (
>> Like PiCloud, my application needs to serialize Python functions in
>> to execute them across multiple machines.
>> I'm currently using PiCloud's CloudPickle serializer code in my
>>prototype (
>>  Serializing arbitrary Python functions is non-trivial, but PiCloud's
>> serializer is very robust and easy to use; I haven't written a function
>> that it can't serialize.
>> I'm interested in extending the CloudPickler module to work with PyPy (
>>  I am concerned that the inclusion of a modified
>> CloudPickler with Spark would cause Spark to become a ³work based on the
>> Library² and require Spark to become LGPL-licensed, in place of its
>> BSD license.
>> Would you be interested in releasing the CloudPickler module and its
>> dependencies as a BSD-licensed Python package (an LGPL-license would
>> too)?
>> CloudPickler has much more functionality than other Python pickling /
>> serialization libraries (
>> and I hope to be able to use it in Spark.
>> I would be very grateful if you are able to accommodate this request.
>> Sincerely,
>> Josh Rosen
>Aaron Staley
>*PiCloud, Inc.*

View raw message