spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Pentreath (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-18454) Changes to improve Nearest Neighbor Search for LSH
Date Tue, 21 Feb 2017 08:01:44 GMT

    [ https://issues.apache.org/jira/browse/SPARK-18454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15875552#comment-15875552
] 

Nick Pentreath edited comment on SPARK-18454 at 2/21/17 8:00 AM:
-----------------------------------------------------------------

Can you also comment on http://mail-archives.apache.org/mod_mbox/spark-user/201702.mbox/%3CCANxMKZU0iVd9Ff4TrWjtdk%3DkEyXAeoXGLEgmVW5vbE5puobE6Q%40mail.gmail.com%3E?
It would be good to understand why we're seeing poor performance vs an alternative impl in
Spark packages, and whether we can take some idea from that on how to improve performance.

Though it's true it does not support similarity join. Still we should investigate.


was (Author: mlnick):
Can you also comment on http://mail-archives.apache.org/mod_mbox/spark-user/201702.mbox/%3CCANxMKZU0iVd9Ff4TrWjtdk%3DkEyXAeoXGLEgmVW5vbE5puobE6Q%40mail.gmail.com%3E?
It would be good to understand why we're seeing poor performance vs an alternative impl in
Spark packages, and whether we can take some idea from that on how to improve performance.

> Changes to improve Nearest Neighbor Search for LSH
> --------------------------------------------------
>
>                 Key: SPARK-18454
>                 URL: https://issues.apache.org/jira/browse/SPARK-18454
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: Yun Ni
>
> We all agree to do the following improvement to Multi-Probe NN Search:
> (1) Use approxQuantile to get the {{hashDistance}} threshold instead of doing full sort
on the whole dataset
> Currently we are still discussing the following:
> (1) What {{hashDistance}} (or Probing Sequence) we should use for {{MinHash}}
> (2) What are the issues and how we should change the current Nearest Neighbor implementation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message