flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From danielblazevski <...@git.apache.org>
Subject [GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...
Date Wed, 07 Oct 2015 12:01:46 GMT
Github user danielblazevski commented on the pull request:

    https://github.com/apache/flink/pull/1220#issuecomment-146175315
  
    @chiwanpark, in lines 203-207
    +                  val useQuadTree = resultParameters.get(useQuadTreeParam).getOrElse(
    +                    training.values.head.size + math.log(math.log(training.values.length)/
    +                      math.log(4.0)) < math.log(training.values.length)/math.log(4.0)
&&
    +                    (metric.isInstanceOf[EuclideanDistanceMetric] ||
    +                      metric.isInstanceOf[SquaredEuclideanDistanceMetric]))
    the code decides whether to use quadtree or not if no value is specified.  This codes
decides based on the number of training + test points + dimension, and is a conservative estimate
so that when it uses the quadtree, the quadtree will improve performance compared to the brute-force
method -- basically the quadtree scales poorly with dimension, but really well with the number
of points. 
    
    As for using a `Vector` for `minVec` and `maxVec`, I plug in `minVec` and `maxVec` to
construct the root Node, and I found it best to use a ListBuffer in the constructor for the
Node class when partitioning the boxes into sub-boxes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message