[ https://issues.apache.org/jira/browse/SPARK-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yuhao yang closed SPARK-5384.
-----------------------------
fixed
> Vectors.sqdist return inconsistent result for sparse/dense vectors when the vectors have
different lengths
> ----------------------------------------------------------------------------------------------------------
>
> Key: SPARK-5384
> URL: https://issues.apache.org/jira/browse/SPARK-5384
> Project: Spark
> Issue Type: Bug
> Components: MLlib
> Affects Versions: 1.2.1
> Environment: centos, others should be similar
> Reporter: yuhao yang
> Assignee: yuhao yang
> Priority: Critical
> Fix For: 1.3.0
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> For two vectors of different lengths, Vectors.sqdist would return different result when
the vectors are represented as sparse and dense respectively. Sample:
> val s1 = new SparseVector(4, Array(0,1,2,3), Array(1.0, 2.0, 3.0, 4.0))
> val s2 = new SparseVector(1, Array(0), Array(9.0))
> val d1 = new DenseVector(Array(1.0, 2.0, 3.0, 4.0))
> val d2 = new DenseVector(Array(9.0))
> println(s1 == d1 && s2 == d2)
> println(Vectors.sqdist(s1, s2))
> println(Vectors.sqdist(d1, d2))
> result:
> true
> 93.0
> 64.0
> More precisely, for the extra part, Vectors.sqdist would include it for sparse vectors
and exclude it for dense vectors. I'll send a PR and we can have more detailed discussion
there.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org
|