spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marie Beaulieu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-25289) ChiSqSelector max on empty collection
Date Fri, 31 Aug 2018 02:58:00 GMT
Marie Beaulieu created SPARK-25289:
--------------------------------------

             Summary: ChiSqSelector max on empty collection
                 Key: SPARK-25289
                 URL: https://issues.apache.org/jira/browse/SPARK-25289
             Project: Spark
          Issue Type: Bug
          Components: MLlib
    Affects Versions: 2.3.1
            Reporter: Marie Beaulieu


In org.apache.spark.mllib.feature.ChiSqSelector.fit, there is a max taken on a possibly empty
collection.

I am using Spark 2.3.1.

Here is an example to reproduce.
{code:java}
import org.apache.spark.mllib.feature.ChiSqSelector
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.sql.SQLContext

val sqlContext = new SQLContext(sc)
implicit val spark = sqlContext.sparkSession

val labeledPoints = (0 to 1).map(n => {
  val v = Vectors.dense((1 to 3).map(_ => n * 1.0).toArray)
  LabeledPoint(n.toDouble, v)
})
val rdd = sc.parallelize(labeledPoints)
val selector = new ChiSqSelector().setSelectorType("fdr").setFdr(0.05)
selector.fit(rdd){code}
Here is the stack trace:
{code:java}
java.lang.UnsupportedOperationException: empty.max
at scala.collection.TraversableOnce$class.max(TraversableOnce.scala:229)
at scala.collection.mutable.ArrayOps$ofInt.max(ArrayOps.scala:234)
at org.apache.spark.mllib.feature.ChiSqSelector.fit(ChiSqSelector.scala:280)
{code}
Looking at line 280 in ChiSqSelector, it's pretty obvious how the collection can be empty.
A simple non empty validation should do the trick.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message