spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph K. Bradley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-3066) Support recommendAll in matrix factorization model
Date Mon, 09 Mar 2015 02:00:43 GMT

    [ https://issues.apache.org/jira/browse/SPARK-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352416#comment-14352416
] 

Joseph K. Bradley commented on SPARK-3066:
------------------------------------------

It's similar, I believe, for ALS.  The cosine similarity metric you get with the dot product
for ALS is a distance metric, right?  So finding the top K products to recommend a given user
is essentially the same as finding the K product feature vectors which are closest to the
user's feature vector.  This optimization could be used both for recommending for a single
user and for recommendAll.

I'm not sure about how effective these approximate nearest neighbor methods are.  My understanding
is that they work reasonable well as long as the feature space is fairly low-dimensional,
which should often be the case for ALS.

My hope is that these approximate nearest neighbor data structures can reduce communication.
 The ones I've seen are based on feature space partitioning, which could potentially allow
you to figure out a subset of product partitions to check for each user.

Using level 3 BLAS might be better; I'm really not sure.  It won't reduce communication, though.
 These 2 types of optimizations might be orthogonal, anyways.

> Support recommendAll in matrix factorization model
> --------------------------------------------------
>
>                 Key: SPARK-3066
>                 URL: https://issues.apache.org/jira/browse/SPARK-3066
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Xiangrui Meng
>            Assignee: Debasish Das
>
> ALS returns a matrix factorization model, which we can use to predict ratings for individual
queries as well as small batches. In practice, users may want to compute top-k recommendations
offline for all users. It is very expensive but a common problem. We can do some optimization
like
> 1) collect one side (either user or product) and broadcast it as a matrix
> 2) use level-3 BLAS to compute inner products
> 3) use Utils.takeOrdered to find top-k



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message