mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marko Dinić <>
Subject Re: Streaming K Means
Date Thu, 02 Oct 2014 09:45:54 GMT

I thank you again for your answer.

I'm trying to implement some kind of cluster based anomaly detection. 
For that, I need to cluster normal examples, and then, when a new 
example gets into system, I need to assign it to nearest centroid (by 
calculating the distance between existing centroids and the new 
example), and then I need the distances from the points in that cluster 
to the centroid.

I could use K Means for that, but I'm hopping to get better results 
using Streaming K Means, primarily because of its KMeans++ 
initialization (which I could probably implement myself, but I'm trying 
to avoid that, since it is already implemented), and also I understand 
that it can be faster than usual Streaming K Means, since it does one 
pass clustering, before the Ball K Means step. Please correct me if you 
disagree with the things I said.

Maybe I'm doing something wrong, but I'm getting only one file as 
output - part-r-00000, while I'm expecting something like - 
ClusteredPoints and Clusters-*-final, in case of KMeans? How can I get 
and read in centroids and clustered points?

Also, I see this qualcluster in the examples/bin/ 
that you have provided, what is it used for?


On понедељак, 29. септембар 2014. 20:00:33 CEST, Suneel Marthi wrote:
> This was replied to earlier with the details u r looking for, repeating
> here again:
> See
> for how to invoke Streaming Kmeans
> Also look at examples/bin/ for the Streaming KMeans
> option.
> If all that u r looking for his centroids and distances from centroids,
> wouldn't KMeans suffice?  It would help if u could provide more details as
> to what u r trying to accomplish here?
> On Mon, Sep 29, 2014 at 9:55 AM, Marko <> wrote:
>> Hello everyone,
>> I have previously asked a question about Streaming K Means examples, and
>> got an answer that there are not so many available.
>> Can anyone give me example of how to call Streaming K Means clustering for
>> a dataset, and how to get the results?
>> What are the results, are they the same as in basic K Means? Do I get
>> centroids and clustered points? And do I get the distance between point and
>> its centroid, like in K Means?
>> I would like to run Streaming K Means clustering on a dataset, and read in
>> centroids, and also I need the distance from the points to their given
>> centroids. How to do that?
>> Thanks

Marko Dinić

View raw message