spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Krishna Sankar <ksanka...@gmail.com>
Subject Re: Vector size mismatch in logistic regression - Spark ML 2.0
Date Sun, 21 Aug 2016 23:37:25 GMT
Hi,
  Looks like the test-dataset has different sizes for X & Y. Possible steps:

   1. What is the test-data-size ?
      - If it is 15,909, check the prediction variable vector - it is now
      29,471, should be 15,909
      - If you expect it to be 29,471, then the X Matrix is not right.
      2. It is also probable that the size of the test-data is something
   else. If so, check the data pipeline.
   3. If you print the count() of the various vectors, I think you can find
   the error.

Cheers & Good Luck
<k/>

On Sun, Aug 21, 2016 at 3:16 PM, janardhan shetty <janardhanp22@gmail.com>
wrote:

> Hi,
>
> I have built the logistic regression model using training-dataset.
> When I am predicting on a test-dataset, it is throwing the below error of
> size mismatch.
>
> Steps done:
> 1. String indexers on categorical features.
> 2. One hot encoding on these indexed features.
>
> Any help is appreciated to resolve this issue or is it a bug ?
>
> SparkException: *Job aborted due to stage failure: Task 0 in stage 635.0
> failed 1 times, most recent failure: Lost task 0.0 in stage 635.0 (TID
> 19421, localhost): java.lang.IllegalArgumentException: requirement failed:
> BLAS.dot(x: Vector, y:Vector) was given Vectors with non-matching sizes:
> x.size = 15909, y.size = 29471* at scala.Predef$.require(Predef.scala:224)
> at org.apache.spark.ml.linalg.BLAS$.dot(BLAS.scala:104) at
> org.apache.spark.ml.classification.LogisticRegressionModel$$
> anonfun$19.apply(LogisticRegression.scala:505) at org.apache.spark.ml.
> classification.LogisticRegressionModel$$anonfun$19.apply(LogisticRegression.scala:504)
> at org.apache.spark.ml.classification.LogisticRegressionModel.
> predictRaw(LogisticRegression.scala:594) at org.apache.spark.ml.
> classification.LogisticRegressionModel.predictRaw(LogisticRegression.scala:484)
> at org.apache.spark.ml.classification.ProbabilisticClassificationMod
> el$$anonfun$1.apply(ProbabilisticClassifier.scala:112) at
> org.apache.spark.ml.classification.ProbabilisticClassificationMod
> el$$anonfun$1.apply(ProbabilisticClassifier.scala:111) at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$
> SpecificUnsafeProjection.evalExpr137$(Unknown Source) at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$
> SpecificUnsafeProjection.apply(Unknown Source) at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$
> SpecificUnsafeProjection.apply(Unknown Source) at
> scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>

Mime
View raw message