spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyukjin Kwon (Jira)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-27992) PySpark socket server should sync with JVM connection thread future
Date Tue, 27 Aug 2019 06:49:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-27992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hyukjin Kwon updated SPARK-27992:
---------------------------------
    Affects Version/s: 2.4.0
                       2.4.1
                       2.4.2
                       2.4.3

> PySpark socket server should sync with JVM connection thread future
> -------------------------------------------------------------------
>
>                 Key: SPARK-27992
>                 URL: https://issues.apache.org/jira/browse/SPARK-27992
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 2.4.0, 2.4.1, 2.4.2, 3.0.0, 2.4.3
>            Reporter: Bryan Cutler
>            Assignee: Bryan Cutler
>            Priority: Blocker
>             Fix For: 3.0.0
>
>
> Both SPARK-27805 and SPARK-27548 identified an issue that errors in a Spark job are not
propagated to Python. This is because toLocalIterator() and toPandas() with Arrow enabled
run Spark jobs asynchronously in a background thread, after creating the socket connection
info. The fix for these was to catch a SparkException if the job errored and then send the
exception through the pyspark serializer.
> A better fix would be to allow Python to await on the serving thread future and join
the thread. That way if the serving thread throws an exception, it will be propagated on the
call to awaitResult.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message