spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From madeleine <>
Subject pyspark serializer can't handle functions?
Date Sun, 15 Jun 2014 23:49:09 GMT
It seems that the default serializer used by pyspark can't serialize a list
of functions.
I've seen some posts about trying to fix this by using dill to serialize
rather than pickle. 
Does anyone know what the status of that project is, or whether there's
another easy workaround?

I've pasted a sample error message below. Here, regs is a function defined
in another file that has been included on all workers via the
pyFiles argument to SparkContext: sc = SparkContext("local",

  File "", line 45, in __init__
    regsRDD = sc.parallelize([regs]*self.n)
  File "/Applications/spark-0.9.1-bin-hadoop2/python/pyspark/",
line 223, in parallelize
    serializer.dump_stream(c, tempFile)
"/Applications/spark-0.9.1-bin-hadoop2/python/pyspark/", line
182, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
"/Applications/spark-0.9.1-bin-hadoop2/python/pyspark/", line
118, in dump_stream
    self._write_with_length(obj, stream)
"/Applications/spark-0.9.1-bin-hadoop2/python/pyspark/", line
128, in _write_with_length
    serialized = self.dumps(obj)
"/Applications/spark-0.9.1-bin-hadoop2/python/pyspark/", line
270, in dumps
    def dumps(self, obj): return cPickle.dumps(obj, 2)
cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup
__builtin__.function failed

View this message in context:
Sent from the Apache Spark User List mailing list archive at

View raw message