spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reza Zadeh <r...@databricks.com>
Subject Re: Row Similarity
Date Wed, 10 Dec 2014 22:30:50 GMT
It's not so cheap to compute row similarities when there are many rows, as
it amounts to computing the outer product of a matrix A (i.e. computing
AA^T, which is expensive).

There is a JIRA to track handling (1) and (2) more efficiently than
computing all pairs: https://issues.apache.org/jira/browse/SPARK-3066



On Wed, Dec 10, 2014 at 2:44 PM, Debasish Das <debasish.das83@gmail.com>
wrote:

> Hi,
>
> It seems there are multiple places where we would like to compute row
> similarity (accurate or approximate similarities)
>
> Basically through RowMatrix columnSimilarities we can compute column
> similarities of a tall skinny matrix
>
> Similarly we should have an API in RowMatrix called rowSimilarities where
> we can compute similar rows in a map-reduce fashion. It will be useful for
> following use-cases:
>
> 1. Generate topK users for each user from matrix factorization model
> 2. Generate topK products for each product from matrix factorization model
> 3. Generate kernel matrix for use in spectral clustering
> 4. Generate kernel matrix for use in kernel regression/classification
>
> I am not sure if there are already good implementation for map-reduce row
> similarity that we can use (ideas like fastfood and kitchen sink felt more
> like for classification use-case but for recommendation also user
> similarities show up which is unsupervised)...
>
> Is there a JIRA tracking it ? If not I can open one and we can discuss
> further on it.
>
> Thanks.
> Deb
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message