spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Pentreath (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-18454) Changes to improve Nearest Neighbor Search for LSH
Date Tue, 21 Feb 2017 07:56:44 GMT

    [ https://issues.apache.org/jira/browse/SPARK-18454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15875552#comment-15875552
] 

Nick Pentreath commented on SPARK-18454:
----------------------------------------

Can you also comment on http://mail-archives.apache.org/mod_mbox/spark-user/201702.mbox/%3CCANxMKZU0iVd9Ff4TrWjtdk%3DkEyXAeoXGLEgmVW5vbE5puobE6Q%40mail.gmail.com%3E?
It would be good to understand why we're seeing poor performance vs an alternative impl in
Spark packages, and whether we can take some idea from that on how to improve performance.

> Changes to improve Nearest Neighbor Search for LSH
> --------------------------------------------------
>
>                 Key: SPARK-18454
>                 URL: https://issues.apache.org/jira/browse/SPARK-18454
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: Yun Ni
>
> We all agree to do the following improvement to Multi-Probe NN Search:
> (1) Use approxQuantile to get the {{hashDistance}} threshold instead of doing full sort
on the whole dataset
> Currently we are still discussing the following:
> (1) What {{hashDistance}} (or Probing Sequence) we should use for {{MinHash}}
> (2) What are the issues and how we should change the current Nearest Neighbor implementation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message