spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Pentreath (JIRA)" <>
Subject [jira] [Commented] (SPARK-18454) Changes to improve Nearest Neighbor Search for LSH
Date Tue, 21 Feb 2017 07:56:44 GMT


Nick Pentreath commented on SPARK-18454:

Can you also comment on
It would be good to understand why we're seeing poor performance vs an alternative impl in
Spark packages, and whether we can take some idea from that on how to improve performance.

> Changes to improve Nearest Neighbor Search for LSH
> --------------------------------------------------
>                 Key: SPARK-18454
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: Yun Ni
> We all agree to do the following improvement to Multi-Probe NN Search:
> (1) Use approxQuantile to get the {{hashDistance}} threshold instead of doing full sort
on the whole dataset
> Currently we are still discussing the following:
> (1) What {{hashDistance}} (or Probing Sequence) we should use for {{MinHash}}
> (2) What are the issues and how we should change the current Nearest Neighbor implementation

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message