spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Krishna Sankar <ksanka...@gmail.com>
Subject Re: Vector size mismatch in logistic regression - Spark ML 2.0
Date Sun, 21 Aug 2016 23:44:06 GMT
Hi,
   Just after I sent the mail, I realized that the error might be with the
training-dataset not the test-dataset.

   1. it might be that you are feeding the full Y vector for training.
   2. Which could mean, you are using ~50-50 training-test split.
   3. Take a good look at the code that does the data split and the
   datasets where they are allocated to.

Cheers
<k/>

On Sun, Aug 21, 2016 at 4:37 PM, Krishna Sankar <ksankar42@gmail.com> wrote:

> Hi,
>   Looks like the test-dataset has different sizes for X & Y. Possible
> steps:
>
>    1. What is the test-data-size ?
>       - If it is 15,909, check the prediction variable vector - it is now
>       29,471, should be 15,909
>       - If you expect it to be 29,471, then the X Matrix is not right.
>       2. It is also probable that the size of the test-data is something
>    else. If so, check the data pipeline.
>    3. If you print the count() of the various vectors, I think you can
>    find the error.
>
> Cheers & Good Luck
> <k/>
>
> On Sun, Aug 21, 2016 at 3:16 PM, janardhan shetty <janardhanp22@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have built the logistic regression model using training-dataset.
>> When I am predicting on a test-dataset, it is throwing the below error of
>> size mismatch.
>>
>> Steps done:
>> 1. String indexers on categorical features.
>> 2. One hot encoding on these indexed features.
>>
>> Any help is appreciated to resolve this issue or is it a bug ?
>>
>> SparkException: *Job aborted due to stage failure: Task 0 in stage 635.0
>> failed 1 times, most recent failure: Lost task 0.0 in stage 635.0 (TID
>> 19421, localhost): java.lang.IllegalArgumentException: requirement failed:
>> BLAS.dot(x: Vector, y:Vector) was given Vectors with non-matching sizes:
>> x.size = 15909, y.size = 29471* at scala.Predef$.require(Predef.scala:224)
>> at org.apache.spark.ml.linalg.BLAS$.dot(BLAS.scala:104) at
>> org.apache.spark.ml.classification.LogisticRegressionModel$$
>> anonfun$19.apply(LogisticRegression.scala:505) at org.apache.spark.ml
>> .classification.LogisticRegressionModel$$anonfun$19.apply(LogisticRegression.scala:504)
>> at org.apache.spark.ml.classification.LogisticRegressionModel.p
>> redictRaw(LogisticRegression.scala:594) at org.apache.spark.ml.classifica
>> tion.LogisticRegressionModel.predictRaw(LogisticRegression.scala:484) at
>> org.apache.spark.ml.classification.ProbabilisticClassificati
>> onModel$$anonfun$1.apply(ProbabilisticClassifier.scala:112) at
>> org.apache.spark.ml.classification.ProbabilisticClassificati
>> onModel$$anonfun$1.apply(ProbabilisticClassifier.scala:111) at
>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$Spe
>> cificUnsafeProjection.evalExpr137$(Unknown Source) at
>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$Spe
>> cificUnsafeProjection.apply(Unknown Source) at
>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$Spe
>> cificUnsafeProjection.apply(Unknown Source) at
>> scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>>
>
>

Mime
View raw message