spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jeremycod <zoran.jere...@gmail.com>
Subject How to recommend most similar users using Spark ML
Date Fri, 15 Jul 2016 03:36:52 GMT
Hi,

I need to develop a service that will recommend user with other similar
users that he can connect to. For each user I have a data about user
preferences for specific items in the form:

user, item, preference  
1,    75,   0.89  
2,    168,  0.478  
2,    99,   0.321  
3,    31,   0.012

So far, I implemented approach using cosine similarity that compare one user
features vector with other users:

def cosineSimilarity(vec1: DoubleMatrix, vec2: DoubleMatrix): Double=
{
    vec1.dot(vec2)/(vec1.norm2()*vec2.norm2())
}
def user2usersimilarity(userid:Integer, recNumber:Integer): Unit ={
    val userFactor=model.userFeatures.lookup(userid).head
    val userVector=new DoubleMatrix(userFactor)
    val s1=cosineSimilarity(userVector,userVector)
    val sims=model.userFeatures.map{case(id,factor)=>
        val factorVector=new DoubleMatrix(factor)
        val sim=cosineSimilarity(factorVector, userVector)
        (id,sim)
    }
    val sortedSims=sims.top(recNumber+1)(Ordering.by[(Int, Double),Double]
{case(id, similarity)=>similarity})
    println(sortedSims.slice(1,recNumber+1).mkString("\n"))
 }

This approach works fine with the MovieLens dataset in terms of quality of
recommendations. However, my concern is related to performance of such
algorithm. Since I have to generate recommendations for all users in the
system, with this approach I would compare each user with all other users in
the system.

I would appreciate if somebody could suggest how to limit comparison of the
user to top N neighbors, or some other algorithm that would work better in
my use case.

Thanks,
Zoran




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-recommend-most-similar-users-using-Spark-ML-tp27342.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message