spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reynold Xin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-15369) Investigate selectively using Jython for parts of PySpark
Date Wed, 05 Oct 2016 21:26:21 GMT

    [ https://issues.apache.org/jira/browse/SPARK-15369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15549990#comment-15549990
] 

Reynold Xin commented on SPARK-15369:
-------------------------------------

So while I'm sure you can improve performance for some UDFs, the limitation of Jython is pretty
severe and I worry we are building on a shaky foundation with this approach. Maybe a better
approach is to speed up serialization for Python, e.g. by introducing block oriented UDFs
that return numpy arrays or Pandas data frames.

> Investigate selectively using Jython for parts of PySpark
> ---------------------------------------------------------
>
>                 Key: SPARK-15369
>                 URL: https://issues.apache.org/jira/browse/SPARK-15369
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>            Reporter: holdenk
>            Priority: Minor
>
> Transferring data from the JVM to the Python executor can be a substantial bottleneck.
While Jython is not suitable for all UDFs or map functions, it may be suitable for some simple
ones. We should investigate the option of using Jython to accelerate these small functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message