spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Debasish Das <debasish.da...@gmail.com>
Subject Re: Row Similarity
Date Thu, 11 Dec 2014 02:01:02 GMT
I added code to compute topK products for each user and topK user for each
product in SPARK-3066..

That is different than row similarity calculation as we need both user and
product factors to calculate the topK recommendations..

For (1) and (2) we are trying to answer similarUsers to given a user and
similarProducts to a given product....

similarProducts to a given product is straightforward to compute through
columnSimilarities/dimsum when products are skinny...

similarUser to a given user will need a map-reduce implementation of row
similarity since the matrix is tall...

I don't see a JIRA for that yet...Are there any good reference for map
reduce implementation of row similarity ?

On Wed, Dec 10, 2014 at 2:30 PM, Reza Zadeh <reza@databricks.com> wrote:

> It's not so cheap to compute row similarities when there are many rows, as
> it amounts to computing the outer product of a matrix A (i.e. computing
> AA^T, which is expensive).
>
> There is a JIRA to track handling (1) and (2) more efficiently than
> computing all pairs: https://issues.apache.org/jira/browse/SPARK-3066
>
>
>
> On Wed, Dec 10, 2014 at 2:44 PM, Debasish Das <debasish.das83@gmail.com>
> wrote:
>
>> Hi,
>>
>> It seems there are multiple places where we would like to compute row
>> similarity (accurate or approximate similarities)
>>
>> Basically through RowMatrix columnSimilarities we can compute column
>> similarities of a tall skinny matrix
>>
>> Similarly we should have an API in RowMatrix called rowSimilarities where
>> we can compute similar rows in a map-reduce fashion. It will be useful for
>> following use-cases:
>>
>> 1. Generate topK users for each user from matrix factorization model
>> 2. Generate topK products for each product from matrix factorization model
>> 3. Generate kernel matrix for use in spectral clustering
>> 4. Generate kernel matrix for use in kernel regression/classification
>>
>> I am not sure if there are already good implementation for map-reduce row
>> similarity that we can use (ideas like fastfood and kitchen sink felt more
>> like for classification use-case but for recommendation also user
>> similarities show up which is unsupervised)...
>>
>> Is there a JIRA tracking it ? If not I can open one and we can discuss
>> further on it.
>>
>> Thanks.
>> Deb
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message