spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Debasish Das <>
Subject Row Similarity
Date Wed, 10 Dec 2014 19:44:58 GMT

It seems there are multiple places where we would like to compute row
similarity (accurate or approximate similarities)

Basically through RowMatrix columnSimilarities we can compute column
similarities of a tall skinny matrix

Similarly we should have an API in RowMatrix called rowSimilarities where
we can compute similar rows in a map-reduce fashion. It will be useful for
following use-cases:

1. Generate topK users for each user from matrix factorization model
2. Generate topK products for each product from matrix factorization model
3. Generate kernel matrix for use in spectral clustering
4. Generate kernel matrix for use in kernel regression/classification

I am not sure if there are already good implementation for map-reduce row
similarity that we can use (ideas like fastfood and kitchen sink felt more
like for classification use-case but for recommendation also user
similarities show up which is unsupervised)...

Is there a JIRA tracking it ? If not I can open one and we can discuss
further on it.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message