spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiangrui Meng <men...@gmail.com>
Subject Re: K-means clustering
Date Tue, 25 Nov 2014 18:10:28 GMT
There is a simple example here:
https://github.com/apache/spark/blob/master/examples/src/main/python/kmeans.py
. You can take advantage of sparsity by computing the distance via
inner products:
http://spark-summit.org/2014/talk/sparse-data-support-in-mllib-2
-Xiangrui

On Tue, Nov 25, 2014 at 2:39 AM, amin mohebbi
<aminn_524@yahoo.com.invalid> wrote:
>  I  have generated a sparse matrix by python, which has the size of
> 4000*174000 (.pkl), the following is a small part of this matrix :
>
>  (0, 45) 1
>   (0, 413) 1
>   (0, 445) 1
>   (0, 107) 4
>   (0, 80) 2
>   (0, 352) 1
>   (0, 157) 1
>   (0, 191) 1
>   (0, 315) 1
>   (0, 395) 4
>   (0, 282) 3
>   (0, 184) 1
>   (0, 403) 1
>   (0, 169) 1
>   (0, 267) 1
>   (0, 148) 1
>   (0, 449) 1
>   (0, 241) 1
>   (0, 303) 1
>   (0, 364) 1
>   (0, 257) 1
>   (0, 372) 1
>   (0, 73) 1
>   (0, 64) 1
>   (0, 427) 1
>   : :
>   (2, 399) 1
>   (2, 277) 1
>   (2, 229) 1
>   (2, 255) 1
>   (2, 409) 1
>   (2, 355) 1
>   (2, 391) 1
>   (2, 28) 1
>   (2, 384) 1
>   (2, 86) 1
>   (2, 285) 2
>   (2, 166) 1
>   (2, 165) 1
>   (2, 419) 1
>   (2, 367) 2
>   (2, 133) 1
>   (2, 61) 1
>   (2, 434) 1
>   (2, 51) 1
>   (2, 423) 1
>   (2, 398) 1
>   (2, 438) 1
>   (2, 389) 1
>   (2, 26) 1
>   (2, 455) 1
>
> I am new in Spark and would like to cluster this matrix by k-means
> algorithm. Can anyone explain to me what kind of problems  I might be faced.
> Please note that I do not want to use Mllib and would like to write my own
> k-means.
> Best Regards
>
> .......................................................
>
> Amin Mohebbi
>
> PhD candidate in Software Engineering
>  at university of Malaysia
>
> Tel : +60 18 2040 017
>
>
>
> E-Mail : TP025921@ex.apiit.edu.my
>
>               amin_524@me.com

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message