[ https://issues.apache.org/jira/browse/FLINK1933?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=14532283#comment14532283
]
ASF GitHub Bot commented on FLINK1933:

Github user tillrohrmann commented on a diff in the pull request:
https://github.com/apache/flink/pull/629#discussion_r29834901
 Diff: flinkstaging/flinkml/src/main/scala/org/apache/flink/ml/math/metrics/distances/CosineDistanceMeasure.scala

@@ 0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.math.metrics.distances
+
+import org.apache.flink.ml.math.Vector
+
+/** This class implements a cosine distance metric. The class calculates the distance
between
+ * the given vectors by dividing the dot product of two vectors by the product of their
lengths.
+ * We convert the result of division to a usable distance. So, 1  cos(angle) is actually
returned.
+ *
+ * @see http://en.wikipedia.org/wiki/Cosine_similarity
+ */
+class CosineDistanceMeasure extends DistanceMeasure {
+ override def distance(a: Vector, b: Vector): Double = {
+ checkValidArguments(a, b)
+
+ val dotProd: Double = a.dot(b)
+ val denominator: Double = a.magnitude * b.magnitude
+ if (dotProd == 0 && denominator == 0) {
 End diff 
what if `a` and `b` are both zero? Are they then similar with respect to the cosine similarity?
> Add distance measure interface and basic implementation to machine learning library
> 
>
> Key: FLINK1933
> URL: https://issues.apache.org/jira/browse/FLINK1933
> Project: Flink
> Issue Type: New Feature
> Components: Machine Learning Library
> Reporter: Chiwan Park
> Assignee: Chiwan Park
> Labels: ML
>
> Add distance measure interface to calculate distance between two vectors and some implementations
of the interface. In FLINK1745, [~till.rohrmann] suggests a interface following:
> {code}
> trait DistanceMeasure {
> def distance(a: Vector, b: Vector): Double
> }
> {code}
> I think that following list of implementation is sufficient to provide first to ML library
users.
> * Manhattan distance [1]
> * Cosine distance [2]
> * Euclidean distance (and Squared) [3]
> * Tanimoto distance [4]
> * Minkowski distance [5]
> * Chebyshev distance [6]
> [1]: http://en.wikipedia.org/wiki/Taxicab_geometry
> [2]: http://en.wikipedia.org/wiki/Cosine_similarity
> [3]: http://en.wikipedia.org/wiki/Euclidean_distance
> [4]: http://en.wikipedia.org/wiki/Jaccard_index#Tanimoto_coefficient_.28extended_Jaccard_coefficient.29
> [5]: http://en.wikipedia.org/wiki/Minkowski_distance
> [6]: http://en.wikipedia.org/wiki/Chebyshev_distance

This message was sent by Atlassian JIRA
(v6.3.4#6332)
