spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Debasish Das <debasish.da...@gmail.com>
Subject Re: MatrixFactorizationModel predict(Int, Int) API
Date Fri, 07 Nov 2014 00:51:39 GMT
model.recommendProducts can only be called from the master then ? I have a
set of 20% users on whom I am performing the test...the 20% users are in a
RDD...if I have to collect them all to master node and then call
model.recommendProducts, that's a issue...

Any idea how to optimize this so that we can calculate MAP statistics on
large samples of data ?


On Thu, Nov 6, 2014 at 4:41 PM, Xiangrui Meng <mengxr@gmail.com> wrote:

> ALS model contains RDDs. So you cannot put `model.recommendProducts`
> inside a RDD closure `userProductsRDD.map`. -Xiangrui
>
> On Thu, Nov 6, 2014 at 4:39 PM, Debasish Das <debasish.das83@gmail.com>
> wrote:
> > I reproduced the problem in mllib tests ALSSuite.scala using the
> following
> > functions:
> >
> >         val arrayPredict = userProductsRDD.map{case(user,product) =>
> >
> >          val recommendedProducts = model.recommendProducts(user,
> products)
> >
> >          val productScore = recommendedProducts.find{x=>x.product ==
> > product}
> >
> >           require(productScore != None)
> >
> >           productScore.get
> >
> >         }.collect
> >
> >         arrayPredict.foreach { elem =>
> >
> >           if (allRatings.get(elem.user, elem.product) != elem.rating)
> >
> >           fail("Prediction APIs don't match")
> >
> >         }
> >
> > If the usage of model.recommendProducts is correct, the test fails with
> the
> > same error I sent before...
> >
> > org.apache.spark.SparkException: Job aborted due to stage failure: Task
> 0 in
> > stage 316.0 failed 1 times, most recent failure: Lost task 0.0 in stage
> > 316.0 (TID 79, localhost): scala.MatchError: null
> >
> > org.apache.spark.rdd.PairRDDFunctions.lookup(PairRDDFunctions.scala:825)
> >
> org.apache.spark.mllib.recommendation.MatrixFactorizationModel.recommendProducts(MatrixFactorizationModel.scala:81)
> >
> > It is a blocker for me and I am debugging it. I will open up a JIRA if
> this
> > is indeed a bug...
> >
> > Do I have to cache the models to make userFeatures.lookup(user).head to
> work
> > ?
> >
> >
> > On Mon, Nov 3, 2014 at 9:24 PM, Xiangrui Meng <mengxr@gmail.com> wrote:
> >>
> >> Was "user" presented in training? We can put a check there and return
> >> NaN if the user is not included in the model. -Xiangrui
> >>
> >> On Mon, Nov 3, 2014 at 5:25 PM, Debasish Das <debasish.das83@gmail.com>
> >> wrote:
> >> > Hi,
> >> >
> >> > I am testing MatrixFactorizationModel.predict(user: Int, product: Int)
> >> > but
> >> > the code fails on userFeatures.lookup(user).head
> >> >
> >> > In computeRmse MatrixFactorizationModel.predict(RDD[(Int, Int)]) has
> >> > been
> >> > called and in all the test-cases that API has been used...
> >> >
> >> > I can perhaps refactor my code to do the same but I was wondering
> >> > whether
> >> > people test the lookup(user) version of the code..
> >> >
> >> > Do I need to cache the model to make it work ? I think right now
> default
> >> > is
> >> > STORAGE_AND_DISK...
> >> >
> >> > Thanks.
> >> > Deb
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message