spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Pentreath (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-14891) ALS in ML never validates input schema
Date Wed, 18 May 2016 19:15:13 GMT

     [ https://issues.apache.org/jira/browse/SPARK-14891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nick Pentreath resolved SPARK-14891.
------------------------------------
       Resolution: Fixed
    Fix Version/s: 2.0.0

> ALS in ML never validates input schema
> --------------------------------------
>
>                 Key: SPARK-14891
>                 URL: https://issues.apache.org/jira/browse/SPARK-14891
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>            Reporter: Nick Pentreath
>            Assignee: Nick Pentreath
>             Fix For: 2.0.0
>
>
> Currently, {{ALS.fit}} never validates the input schema. There is a {{transformSchema}}
impl that calls {{validateAndTransformSchema}}, but it is never called in either {{ALS.fit}}
or {{ALSModel.transform}}.
> This was highlighted in SPARK-13857 (and failing PySpark tests [here|https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56849/consoleFull])when
adding a call to {{transformSchema}} in {{ALSModel.transform}} that actually validates the
input schema. The PySpark docstring tests result in Long inputs by default, which fail validation
as Int is required.
> Currently, the inputs for user and item ids are cast to Int, with no input type validation
(or warning message). So users could pass in Long, Float, Double, etc. It's also not made
clear anywhere in the docs that only Int types for user and item are supported.
> Enforcing validation seems the best option but might break user code that previously
"just worked" especially in PySpark. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message