spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Baker <dist...@acm.org>
Subject Re: Problems with Pyspark + Dill tests
Date Mon, 23 Jun 2014 21:27:22 GMT
On Thu, Jun 19, 2014 at 3:56 PM, Josh Rosen <rosenville@gmail.com> wrote:
> Thanks for helping with the Dill integration; I had some early first attempts, but had
to set them aside when I got busy with some other work.
>
> Just to bring everyone up to speed regarding context:
> There are some objects that PySpark’s `cloudpickle` library doesn’t serialize properly,
such as operator.getattr (https://issues.apache.org/jira/browse/SPARK-791) or NamedTuples
(https://issues.apache.org/jira/browse/SPARK-1687).
> My early attempt at replacing CloudPickle with Dill ran into problems because of slight
differences in how Dill pickles functions defined in doctests versus functions defined elsewhere.
 I opened a bug report for this with the Dill developers (https://github.com/uqfoundation/dill/issues/18),
who subsequently fixed the bug (https://github.com/uqfoundation/dill/pull/29).
> It looks like there’s already a couple of Dill issues with examples of the “Can’t
pickle _ it’s not found as _” bug (https://github.com/uqfoundation/dill/search?q=%22not+found+as%22&type=Issues).
 If you can find a small test case that reproduces this bug, I’d consider opening a new
Dill issue.

Thanks for the context, Josh.

I've gone ahead and created a new test case and just opened a new issue;

https://github.com/uqfoundation/dill/issues/49

Mime
View raw message