spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Davies Liu <dav...@databricks.com>
Subject Re: PySpark with OpenCV causes python worker to crash
Date Fri, 05 Jun 2015 06:43:44 GMT
Please file a bug here: https://issues.apache.org/jira/browse/SPARK/

Could you also provide a way to reproduce this bug (including some datasets)?

On Thu, Jun 4, 2015 at 11:30 PM, Sam Stoelinga <sammiestoel@gmail.com> wrote:
> I've changed the SIFT feature extraction to SURF feature extraction and it
> works...
>
> Following line was changed:
> sift = cv2.xfeatures2d.SIFT_create()
>
> to
>
> sift = cv2.xfeatures2d.SURF_create()
>
> Where should I file this as a bug? When not running on Spark it works fine
> so I'm saying it's a spark bug.
>
> On Fri, Jun 5, 2015 at 2:17 PM, Sam Stoelinga <sammiestoel@gmail.com> wrote:
>>
>> Yea should have emphasized that. I'm running the same code on the same VM.
>> It's a VM with spark in standalone mode and I run the unit test directly on
>> that same VM. So OpenCV is working correctly on that same machine but when
>> moving the exact same OpenCV code to spark it just crashes.
>>
>> On Tue, Jun 2, 2015 at 5:06 AM, Davies Liu <davies@databricks.com> wrote:
>>>
>>> Could you run the single thread version in worker machine to make sure
>>> that OpenCV is installed and configured correctly?
>>>
>>> On Sat, May 30, 2015 at 6:29 AM, Sam Stoelinga <sammiestoel@gmail.com>
>>> wrote:
>>> > I've verified the issue lies within Spark running OpenCV code and not
>>> > within
>>> > the sequence file BytesWritable formatting.
>>> >
>>> > This is the code which can reproduce that spark is causing the failure
>>> > by
>>> > not using the sequencefile as input at all but running the same
>>> > function
>>> > with same input on spark but fails:
>>> >
>>> > def extract_sift_features_opencv(imgfile_imgbytes):
>>> >     imgfilename, discardsequencefile = imgfile_imgbytes
>>> >     imgbytes = bytearray(open("/tmp/img.jpg", "rb").read())
>>> >     nparr = np.fromstring(buffer(imgbytes), np.uint8)
>>> >     img = cv2.imdecode(nparr, 1)
>>> >     gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
>>> >     sift = cv2.xfeatures2d.SIFT_create()
>>> >     kp, descriptors = sift.detectAndCompute(gray, None)
>>> >     return (imgfilename, "test")
>>> >
>>> > And corresponding tests.py:
>>> > https://gist.github.com/samos123/d383c26f6d47d34d32d6
>>> >
>>> >
>>> > On Sat, May 30, 2015 at 8:04 PM, Sam Stoelinga <sammiestoel@gmail.com>
>>> > wrote:
>>> >>
>>> >> Thanks for the advice! The following line causes spark to crash:
>>> >>
>>> >> kp, descriptors = sift.detectAndCompute(gray, None)
>>> >>
>>> >> But I do need this line to be executed and the code does not crash
>>> >> when
>>> >> running outside of Spark but passing the same parameters. You're
>>> >> saying
>>> >> maybe the bytes from the sequencefile got somehow transformed and
>>> >> don't
>>> >> represent an image anymore causing OpenCV to crash the whole python
>>> >> executor.
>>> >>
>>> >> On Fri, May 29, 2015 at 2:06 AM, Davies Liu <davies@databricks.com>
>>> >> wrote:
>>> >>>
>>> >>> Could you try to comment out some lines in
>>> >>> `extract_sift_features_opencv` to find which line cause the crash?
>>> >>>
>>> >>> If the bytes came from sequenceFile() is broken, it's easy to crash
a
>>> >>> C library in Python (OpenCV).
>>> >>>
>>> >>> On Thu, May 28, 2015 at 8:33 AM, Sam Stoelinga
>>> >>> <sammiestoel@gmail.com>
>>> >>> wrote:
>>> >>> > Hi sparkers,
>>> >>> >
>>> >>> > I am working on a PySpark application which uses the OpenCV
>>> >>> > library. It
>>> >>> > runs
>>> >>> > fine when running the code locally but when I try to run it
on
>>> >>> > Spark on
>>> >>> > the
>>> >>> > same Machine it crashes the worker.
>>> >>> >
>>> >>> > The code can be found here:
>>> >>> > https://gist.github.com/samos123/885f9fe87c8fa5abf78f
>>> >>> >
>>> >>> > This is the error message taken from STDERR of the worker log:
>>> >>> > https://gist.github.com/samos123/3300191684aee7fc8013
>>> >>> >
>>> >>> > Would like pointers or tips on how to debug further? Would
be nice
>>> >>> > to
>>> >>> > know
>>> >>> > the reason why the worker crashed.
>>> >>> >
>>> >>> > Thanks,
>>> >>> > Sam Stoelinga
>>> >>> >
>>> >>> >
>>> >>> > org.apache.spark.SparkException: Python worker exited unexpectedly
>>> >>> > (crashed)
>>> >>> > at
>>> >>> >
>>> >>> > org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:172)
>>> >>> > at
>>> >>> >
>>> >>> >
>>> >>> > org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:176)
>>> >>> > at
>>> >>> > org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94)
>>> >>> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>>> >>> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>>> >>> > at
>>> >>> > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>>> >>> > at org.apache.spark.scheduler.Task.run(Task.scala:64)
>>> >>> > at
>>> >>> >
>>> >>> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>>> >>> > at
>>> >>> >
>>> >>> >
>>> >>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >>> > at
>>> >>> >
>>> >>> >
>>> >>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >>> > at java.lang.Thread.run(Thread.java:745)
>>> >>> > Caused by: java.io.EOFException
>>> >>> > at java.io.DataInputStream.readInt(DataInputStream.java:392)
>>> >>> > at
>>> >>> >
>>> >>> > org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:108)
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>
>>> >>
>>> >
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message