spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Saif.A.Ell...@wellsfargo.com>
Subject How to deal with null values on LabeledPoint
Date Tue, 07 Jul 2015 15:35:26 GMT
Hello,

reading from spark-csv, got some lines with missing data (not invalid).

applying map() to create a LabeledPoint with denseVector. Using map( Row => Row.getDouble(col_index)
)

To this point:
res173: org.apache.spark.mllib.regression.LabeledPoint = (-1.530132691E9,[162.89431,13.55811,18.3346818,-1.6653182])

As running the following code:

      val model = new LogisticRegressionWithLBFGS().
          setNumClasses(2).
          setValidateData(true).
          run(data_map)

      java.lang.RuntimeException: Failed to check null bit for primitive double value.

Debugging this, I am pretty sure this is because rows that look like -2.593849123898,392.293891,,,,


Mime
View raw message