spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abdeali Kothari <abdealikoth...@gmail.com>
Subject PicklingError - Can't pickle py4j.protocol.Py4JJavaError - it's not the same object
Date Sun, 02 Dec 2018 12:53:23 GMT
I am using spark + celery to run some spark scripts async from the rest of
my code.
When any of my celery tasks get an error and throw a python Exception, the
celery on_error() is called and I can handle exceptions easily by logging
the exception.

Seems like the only exception that fails to work is Py4JJavaErrors thrown
by spark.
When my code generates a py4jJavaError, i get an exception in the error
handling of celery. It says the error could not be unpickled right because
they are two different entities.

I'm looking for clues as to what could cause it. I am able to import
py4j.protocol.Py4jJavaError and do debugs.
I went into pyspark/sql/utils.py:capture_sql_exception() which is where my
Py4jJavaError is being thrown and found:
*py4j.__file__ *=
/usr/local/hadoop/spark2.3.1/python/lib/py4j-0.10.7-src.zip/py4j/__init__.py
*id(py4j.protocol.Py4JJavaError) *= 140436967669656

I also went to where the unpickling exception was occurring inside billiard
codebase and found:
*py4j.__file__* =
/usr/local/hadoop/spark2.3.1/python/lib/py4j-0.10.7-src.zip/py4j/__init__.py
*id(py4j.protocol.Py4JJavaError)* =140436967669656

I'm confused as to why an error like this can come up if the id() from
python for both these types are the exact same. and also the file that is
loading them is the same.
I was originally under the impression that there were multiple versions of
py4j conflicting with each other but that does not seem to be the case.

Any thoughts on this would be helpful! Thanks

---

Here is the exact error I get during the exception handling:

2018-12-02 18:11:41,403: ERROR/MainProcess] Task handler raised error:
<MaybeEncodingError: Error sending result: '"(1, <ExceptionInfo:
Py4JJavaError('An error occurred while calling o1000.showString.\\n',
'JavaObject id=o1001')>, None)"'. Reason: ''PicklingError("Can\'t pickle
<class \'py4j.protocol.Py4JJavaError\'>: it\'s not the same object as
py4j.protocol.Py4JJavaError",)''.>
Traceback (most recent call last):
  File "venv/lib/python3.6/site-packages/billiard/pool.py", line 363, in
workloop
    put((READY, (job, i, result, inqW_fd)))
  File "venv/lib/python3.6/site-packages/billiard/queues.py", line 366, in
put
    self.send_payload(ForkingPickler.dumps(obj))
  File "venv/lib/python3.6/site-packages/billiard/reduction.py", line 61,
in dumps
    cls(buf, protocol).dump(obj)
billiard.pool.MaybeEncodingError: Error sending result: '"(1,
<ExceptionInfo: Py4JJavaError('An error occurred while calling
o1000.showString.\\n', 'JavaObject id=o1001')>, None)"'. Reason:
''PicklingError("Can\'t pickle <class \'py4j.protocol.Py4JJavaError\'>:
it\'s not the same object as py4j.protocol.Py4JJavaError",)''.

Mime
View raw message