spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dariusz Kobylarz <darek.kobyl...@gmail.com>
Subject MatrixFactorizationModel serialization
Date Fri, 07 Nov 2014 23:33:11 GMT
I am trying to persist MatrixFactorizationModel (Collaborative Filtering 
example) and use it in another script to evaluate/apply it.
This is the exception I get when I try to use a deserialized model instance:

Exception in thread "main" java.lang.NullPointerException
     at 
org.apache.spark.rdd.CoGroupedRDD$$anonfun$getPartitions$1.apply$mcVI$sp(CoGroupedRDD.scala:103)
     at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
     at 
org.apache.spark.rdd.CoGroupedRDD.getPartitions(CoGroupedRDD.scala:101)
     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
     at scala.Option.getOrElse(Option.scala:120)
     at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
     at 
org.apache.spark.rdd.MappedValuesRDD.getPartitions(MappedValuesRDD.scala:26)
     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
     at scala.Option.getOrElse(Option.scala:120)
     at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
     at 
org.apache.spark.rdd.FlatMappedValuesRDD.getPartitions(FlatMappedValuesRDD.scala:26)
     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
     at scala.Option.getOrElse(Option.scala:120)
     at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
     at 
org.apache.spark.rdd.FlatMappedRDD.getPartitions(FlatMappedRDD.scala:30)
     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
     at scala.Option.getOrElse(Option.scala:120)
     at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
     at org.apache.spark.Partitioner$$anonfun$2.apply(Partitioner.scala:58)
     at org.apache.spark.Partitioner$$anonfun$2.apply(Partitioner.scala:58)
     at scala.math.Ordering$$anon$5.compare(Ordering.scala:122)
     at java.util.TimSort.countRunAndMakeAscending(TimSort.java:324)
     at java.util.TimSort.sort(TimSort.java:189)
     at java.util.TimSort.sort(TimSort.java:173)
     at java.util.Arrays.sort(Arrays.java:659)
     at scala.collection.SeqLike$class.sorted(SeqLike.scala:615)
     at scala.collection.AbstractSeq.sorted(Seq.scala:40)
     at scala.collection.SeqLike$class.sortBy(SeqLike.scala:594)
     at scala.collection.AbstractSeq.sortBy(Seq.scala:40)
     at 
org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:58)
     at 
org.apache.spark.rdd.PairRDDFunctions.join(PairRDDFunctions.scala:536)
     at 
org.apache.spark.mllib.recommendation.MatrixFactorizationModel.predict(MatrixFactorizationModel.scala:57)
     ...

Is this model serializable at all, I noticed it has two RDDs inside 
(user & product features)?

Thanks,



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message