spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marie Beaulieu (JIRA)" <>
Subject [jira] [Created] (SPARK-25289) ChiSqSelector max on empty collection
Date Fri, 31 Aug 2018 02:58:00 GMT
Marie Beaulieu created SPARK-25289:

             Summary: ChiSqSelector max on empty collection
                 Key: SPARK-25289
             Project: Spark
          Issue Type: Bug
          Components: MLlib
    Affects Versions: 2.3.1
            Reporter: Marie Beaulieu

In, there is a max taken on a possibly empty

I am using Spark 2.3.1.

Here is an example to reproduce.
import org.apache.spark.mllib.feature.ChiSqSelector
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.sql.SQLContext

val sqlContext = new SQLContext(sc)
implicit val spark = sqlContext.sparkSession

val labeledPoints = (0 to 1).map(n => {
  val v = Vectors.dense((1 to 3).map(_ => n * 1.0).toArray)
  LabeledPoint(n.toDouble, v)
val rdd = sc.parallelize(labeledPoints)
val selector = new ChiSqSelector().setSelectorType("fdr").setFdr(0.05){code}
Here is the stack trace:
java.lang.UnsupportedOperationException: empty.max
at scala.collection.TraversableOnce$class.max(TraversableOnce.scala:229)
at scala.collection.mutable.ArrayOps$ofInt.max(ArrayOps.scala:234)
Looking at line 280 in ChiSqSelector, it's pretty obvious how the collection can be empty.
A simple non empty validation should do the trick.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message