spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Valdivia <h...@danielvaldivia.com>
Subject How to deal with same class mismatch?
Date Mon, 01 Feb 2016 20:07:15 GMT
Hi, I'm having a couple of issues.

I'm experiencing a known issue <https://issues.apache.org/jira/browse/SPARK-1199> on
the spark-shell where I'm getting a type mismatch for the right class

<console>:82: error: type mismatch; 
found : org.apache.spark.rdd.org.apache.spark.rdd.org.apache.spark.rdd.org.apache.spark.rdd.org.apache.spark.rdd.RDD[org.apache.spark.mllib.regression.LabeledPoint]
required: org.apache.spark.rdd.org.apache.spark.rdd.org.apache.spark.rdd.org.apache.spark.rdd.org.apache.spark.rdd.RDD[org.apache.spark.mllib.regression.LabeledPoint]

I was wondering if anyone has found a way around this?

I was trying to dump my RDD into a brand new RDD, and each element of the LabeledPoint into
a new one, just in case there was the internal class causing the problem however I can't seem
to be able to access the vectors inside my LabeledPoint. in the process of generating my RDD
I did used Java Maps and converted them back to scala

Any advice on how to remap this LabelPoint?

tfidfs.take(1)
res143: Array[org.apache.spark.mllib.regression.LabeledPoint] = Array((143.0,(7175,[2738,4134,4756,6354,6424],[492.63076923076926,11.060794473229707,2.7010544074230283,57.69549549549549,76.2404761904762])))

tfidfs.take(1)(0).label
res144: Double = 143.0 

tfidfs.take(1)(0).features
res145: org.apache.spark.mllib.linalg.Vector = (7175,[2738,4134,4756,6354,6424],[492.63076923076926,11.060794473229707,2.7010544074230283,57.69549549549549,76.2404761904762])


tfidfs.take(1)(0).features(0)
res146: Double = 0.0 

tfidfs.take(1)(0).features(1)
res147: Double = 0.0 

tfidfs.take(1)(0).features(2)
res148: Double = 0.0
Mime
View raw message