spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph K. Bradley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-5981) pyspark ML models should support predict/transform on vector within map
Date Wed, 04 Mar 2015 17:51:04 GMT

    [ https://issues.apache.org/jira/browse/SPARK-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347237#comment-14347237
] 

Joseph K. Bradley commented on SPARK-5981:
------------------------------------------

When predict() is called on a single vector or an RDD, it's being called on the driver, has
access to the SparkContext (which is required by JavaModelWrapper.call).  But when predict()
is called within an RDD.map, it's being called on workers, which don't have the SparkContext.

> pyspark ML models should support predict/transform on vector within map
> -----------------------------------------------------------------------
>
>                 Key: SPARK-5981
>                 URL: https://issues.apache.org/jira/browse/SPARK-5981
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib, PySpark
>    Affects Versions: 1.3.0
>            Reporter: Joseph K. Bradley
>
> Currently, most Python models only have limited support for single-vector prediction.
> E.g., one can call {code}model.predict(myFeatureVector){code} for a single instance,
but that fails within a map for Python ML models and transformers which use JavaModelWrapper:
> {code}
> data.map(lambda features: model.predict(features))
> {code}
> This fails because JavaModelWrapper.call uses the SparkContext (within the transformation).
 (It works for linear models, which do prediction within Python.)
> Supporting prediction within a map would require storing the model and doing prediction/transformation
within Python.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message