spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Patrick Wendell (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-1579) PySpark should distinguish expected IOExceptions from unexpected ones in the worker
Date Wed, 23 Apr 2014 01:51:15 GMT

     [ https://issues.apache.org/jira/browse/SPARK-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Patrick Wendell updated SPARK-1579:
-----------------------------------

    Description: 
I chatted with [~adav] a bit about this. Right now we drop IOExceptions because they are (in
some cases) expected if a Python worker returns before consuming its entire input. The issue
is this swallows legitimate IO exceptions when they occur.

One thought we had was to change the daemon.py file to, instead of closing the socket when
the function is over, simply busy-wait on the socket being closed. We'd transfer the responsibility
for closing the socket to the Java reader. The Java reader could, when it has finished consuming
output form Python, set a flag on a volatile variable to indicate that Python has fully returned,
and then close the socket. Then if an IOException is thrown in the write thread, it only swallows
the exception if we are expecting it.

This would also let us remove the warning message right now.

  was:
I chatted with [~adav] a bit about this. Right now we drop IOExceptions because they are (in
some cases) expected if a Python worker returns before consuming its entire input. The issue
is this swallows legitimate IO exceptions when they occur.

One thought we had was to change the daemon.py file to, instead of closing the socket when
the function is over, simply busy-wait on the socket being closed. We'd transfer the responsibility
for closing the socket to the Java reader. The Java reader could, when it has finished consuming
output form Python, set a flag on a volatile variable to indicate that Python has fully returned,
and then close the socket. Then if an IOException is found, we only swallow it if we are expecting
it.

This would also let us remove the warning message right now.


> PySpark should distinguish expected IOExceptions from unexpected ones in the worker
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-1579
>                 URL: https://issues.apache.org/jira/browse/SPARK-1579
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>            Reporter: Patrick Wendell
>            Assignee: Aaron Davidson
>             Fix For: 1.1.0
>
>
> I chatted with [~adav] a bit about this. Right now we drop IOExceptions because they
are (in some cases) expected if a Python worker returns before consuming its entire input.
The issue is this swallows legitimate IO exceptions when they occur.
> One thought we had was to change the daemon.py file to, instead of closing the socket
when the function is over, simply busy-wait on the socket being closed. We'd transfer the
responsibility for closing the socket to the Java reader. The Java reader could, when it has
finished consuming output form Python, set a flag on a volatile variable to indicate that
Python has fully returned, and then close the socket. Then if an IOException is thrown in
the write thread, it only swallows the exception if we are expecting it.
> This would also let us remove the warning message right now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message