spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph K. Bradley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-24632) Allow 3rd-party libraries to use pyspark.ml abstractions for Java wrappers for persistence
Date Wed, 01 Aug 2018 18:23:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-24632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16565767#comment-16565767
] 

Joseph K. Bradley commented on SPARK-24632:
-------------------------------------------

That's a good point.  Let's do it your way.  : )
You're right that putting this knowledge of wrapper classpaths on the Python side is better
organized.  That will allow users to wrap Scala classes later without breaking APIs (by adding
new mix-ins).

> Allow 3rd-party libraries to use pyspark.ml abstractions for Java wrappers for persistence
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-24632
>                 URL: https://issues.apache.org/jira/browse/SPARK-24632
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, PySpark
>    Affects Versions: 2.4.0
>            Reporter: Joseph K. Bradley
>            Assignee: Joseph K. Bradley
>            Priority: Major
>
> This is a follow-up for [SPARK-17025], which allowed users to implement Python PipelineStages
in 3rd-party libraries, include them in Pipelines, and use Pipeline persistence.  This task
is to make it easier for 3rd-party libraries to have PipelineStages written in Java and then
to use pyspark.ml abstractions to create wrappers around those Java classes.  This is currently
possible, except that users hit bugs around persistence.
> I spent a bit thinking about this and wrote up thoughts and a proposal in the doc linked
below.  Summary of proposal:
> Require that 3rd-party libraries with Java classes with Python wrappers implement a trait
which provides the corresponding Python classpath in some field:
> {code}
> trait PythonWrappable {
>   def pythonClassPath: String = …
> }
> MyJavaType extends PythonWrappable
> {code}
> This will not be required for MLlib wrappers, which we can handle specially.
> One issue for this task will be that we may have trouble writing unit tests.  They would
ideally test a Java class + Python wrapper class pair sitting outside of pyspark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message