spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nan Zhu <zhunanmcg...@gmail.com>
Subject Re: confusion on RDD usage in MatrixFactorizationModel (master branch)
Date Wed, 08 Jan 2014 15:43:05 GMT
ignore that  

These operations are
 * automatically available on any RDD of the right type (e.g. RDD[(Int, Int)] through implicit
 * conversions when you `import org.apache.spark.SparkContext._`.


--  
Nan Zhu



On Wednesday, January 8, 2014 at 10:38 AM, Nan Zhu wrote:

> Hi, all  
>  
> I’m reading the source code of master branch  
>  
> there is a new predict() function in MatrixFactorizationModel
>  
> /**
>     * Predict the rating of many users for many products.
>     * The output RDD has an element per each element in the input RDD (including all
duplicates)
>     * unless a user or product is missing in the training set.
>     *
>     * @param usersProducts  RDD of (user, product) pairs.
>     * @return RDD of Ratings.
>     */
>   def predict(usersProducts: RDD[(Int, Int)]): RDD[Rating] = {
>     val users = userFeatures.join(usersProducts).map{
>       case (user, (uFeatures, product)) => (product, (user, uFeatures))
>     }
>     users.join(productFeatures).map {
>       case (product, ((user, uFeatures), pFeatures)) =>
>         val userVector = new DoubleMatrix(uFeatures)
>         val productVector = new DoubleMatrix(pFeatures)
>         Rating(user, product, userVector.dot(productVector))
>     }
>   }
>  
>  
> it seems that the author can directly call join with a RDD object?  
>  
> It’s a new feature in next version? I’m used to creating a PairRDDFunctions with
the current RDD and then calls join, etc.
>  
> Did I misunderstand something?
>  
> Best,  
>  
> --  
> Nan Zhu
>  


Mime
View raw message