spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wanda Hawk <wanda_haw...@yahoo.com>
Subject Re: KMeans code is rubbish
Date Thu, 10 Jul 2014 09:17:24 GMT
so this is what I am running: 
"./bin/run-example SparkKMeans ~/Documents/2dim2.txt 2 0.001"

And this is the input file:"
┌───[spark2013@SparkOne]──────[~/spark-1.0.0].$
└───#!cat ~/Documents/2dim2.txt
2 1
1 2
3 2
2 3
4 1
5 1
6 1
4 2
6 2
4 3
5 3
6 3
"

This is the final output from spark:
"14/07/10 20:05:12 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-empty
blocks out of 2 blocks
14/07/10 20:05:12 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches
in 0 ms
14/07/10 20:05:12 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648,
targetRequestSize: 10066329
14/07/10 20:05:12 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-empty
blocks out of 2 blocks
14/07/10 20:05:12 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches
in 0 ms
14/07/10 20:05:12 INFO Executor: Serialized size of result for 14 is 1433
14/07/10 20:05:12 INFO Executor: Sending result for 14 directly to driver
14/07/10 20:05:12 INFO Executor: Finished task ID 14
14/07/10 20:05:12 INFO DAGScheduler: Completed ResultTask(6, 0)
14/07/10 20:05:12 INFO TaskSetManager: Finished TID 14 in 5 ms on localhost (progress: 1/2)
14/07/10 20:05:12 INFO Executor: Serialized size of result for 15 is 1433
14/07/10 20:05:12 INFO Executor: Sending result for 15 directly to driver
14/07/10 20:05:12 INFO Executor: Finished task ID 15
14/07/10 20:05:12 INFO DAGScheduler: Completed ResultTask(6, 1)
14/07/10 20:05:12 INFO TaskSetManager: Finished TID 15 in 7 ms on localhost (progress: 2/2)
14/07/10 20:05:12 INFO DAGScheduler: Stage 6 (collectAsMap at SparkKMeans.scala:75) finished
in 0.008 s
14/07/10 20:05:12 INFO TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks have all completed,
from pool
14/07/10 20:05:12 INFO SparkContext: Job finished: collectAsMap at SparkKMeans.scala:75, took
0.02472681 s
Finished iteration (delta = 0.0)
Final centers:
DenseVector(2.8571428571428568, 2.0)
DenseVector(5.6000000000000005, 2.0)
"




On Thursday, July 10, 2014 12:02 PM, Bertrand Dechoux <dechouxb@gmail.com> wrote:
 


A picture is worth a thousand... Well, a picture with this dataset, what you are expecting
and what you get, would help answering your initial question.


Bertrand


On Thu, Jul 10, 2014 at 10:44 AM, Wanda Hawk <wanda_hawk89@yahoo.com> wrote:

Can someone please run the standard kMeans code on this input with 2 centers ?:
>2 1
>1 2
>3 2
>2 3
>4 1
>5 1
>6 1
>4 2
>6 2
>4 3
>5 3
>6 3
>
>
>The obvious result should be (2,2) and (5,2) ... (you can draw them if you don't believe
me ...)
>
>
>Thanks, 
>Wanda
Mime
View raw message