spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Debasish Das <debasish.da...@gmail.com>
Subject Re: MatrixFactorizationModel predict(Int, Int) API
Date Fri, 07 Nov 2014 00:39:00 GMT
I reproduced the problem in mllib tests ALSSuite.scala using the following
functions:

        val arrayPredict = userProductsRDD.map{case(user,product) =>

         val recommendedProducts = model.recommendProducts(user, products)

         val productScore = recommendedProducts.find{x=>x.product == product
}

          require(productScore != None)

          productScore.get

        }.collect

        arrayPredict.foreach { elem =>

          if (allRatings.get(elem.user, elem.product) != elem.rating)

          fail("Prediction APIs don't match")

        }

If the usage of model.recommendProducts is correct, the test fails with the
same error I sent before...

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 316.0 failed 1 times, most recent failure: Lost task 0.0 in stage
316.0 (TID 79, localhost): scala.MatchError: null

org.apache.spark.rdd.PairRDDFunctions.lookup(PairRDDFunctions.scala:825)
 org.apache.spark.mllib.recommendation.MatrixFactorizationModel.recommendProducts(MatrixFactorizationModel.scala:81)

It is a blocker for me and I am debugging it. I will open up a JIRA if this
is indeed a bug...

Do I have to cache the models to make userFeatures.lookup(user).head to
work ?

On Mon, Nov 3, 2014 at 9:24 PM, Xiangrui Meng <mengxr@gmail.com> wrote:

> Was "user" presented in training? We can put a check there and return
> NaN if the user is not included in the model. -Xiangrui
>
> On Mon, Nov 3, 2014 at 5:25 PM, Debasish Das <debasish.das83@gmail.com>
> wrote:
> > Hi,
> >
> > I am testing MatrixFactorizationModel.predict(user: Int, product: Int)
> but
> > the code fails on userFeatures.lookup(user).head
> >
> > In computeRmse MatrixFactorizationModel.predict(RDD[(Int, Int)]) has been
> > called and in all the test-cases that API has been used...
> >
> > I can perhaps refactor my code to do the same but I was wondering whether
> > people test the lookup(user) version of the code..
> >
> > Do I need to cache the model to make it work ? I think right now default
> is
> > STORAGE_AND_DISK...
> >
> > Thanks.
> > Deb
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message