spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-24489) No check for invalid input type of weight data in ml.PowerIterationClustering
Date Wed, 27 Jun 2018 20:38:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-24489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sean Owen updated SPARK-24489:
------------------------------
    Target Version/s:   (was: 2.4.0)
            Priority: Minor  (was: Major)
       Fix Version/s:     (was: 2.4.0)
          Issue Type: Improvement  (was: Bug)

> No check for invalid input type of weight data in ml.PowerIterationClustering
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-24489
>                 URL: https://issues.apache.org/jira/browse/SPARK-24489
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.4.0
>            Reporter: shahid
>            Priority: Minor
>
> The test case will result the following failure. currently in ml.PIC, there is no check
for the data type of weight column. We should check for the valid data type of the weight.
> {code:java}
>   test("invalid input types for weight") {
>     val invalidWeightData = spark.createDataFrame(Seq(
>       (0L, 1L, "a"),
>       (2L, 3L, "b")
>     )).toDF("src", "dst", "weight")
>     val pic = new PowerIterationClustering()
>       .setWeightCol("weight")
>     val result = pic.assignClusters(invalidWeightData)
>   }
> {code}
> {code:java}
> Job aborted due to stage failure: Task 0 in stage 8077.0 failed 1 times, most recent
failure: Lost task 0.0 in stage 8077.0 (TID 882, localhost, executor driver): scala.MatchError:
[0,1,null] (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema)
> 	at org.apache.spark.ml.clustering.PowerIterationClustering$$anonfun$3.apply(PowerIterationClustering.scala:178)
> 	at org.apache.spark.ml.clustering.PowerIterationClustering$$anonfun$3.apply(PowerIterationClustering.scala:178)
> 	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
> 	at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> 	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> 	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> 	at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
> 	at org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:107)
> 	at org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:105)
> 	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message