mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paritosh Ranjan <pran...@xebia.com>
Subject Re: How to use kmeans clustering algorithm of Mahout
Date Wed, 12 Sep 2012 06:42:17 GMT
I could not understand the question correctly, can you explain more?
Here you can find how to use kmeans algorithm of Mahout 
https://cwiki.apache.org/confluence/display/MAHOUT/K-Means+Clustering.

On 12-09-2012 11:43, Don.Tan wrote:
> Aloha!
>
>    I am new to hadoop and mahout, but I have set up the hadoop cluster.
>
>    I am working on a clustering task lately. I think I could not make 
> it quickly because I don't know too much about how to deal with 
> massive data ( my data contains 1400000 user and 50000 features..plus 
> that is sparse ).
>
>    Could you tell me how deal with that? A slice of data is here:
>
> 167555,152622,162252,79481,66540,41942,75500,167898,61923,182083,180681,181135,174449,166439,167307,174126,87800,2826,
>     98660,158620,33900,
> 4780,13922,45040,159210,26423,1471,68200,70402,109721,145860,23740,5818,15087,47861,158620,170482,170161,39120,164514,5854,169183,151229,171110,163457,4356,21363,1307,78105,1322,177011,167822,

>
> 176329,116300,175216,167307,46710,138740,100681,2089,1842,1206,101702,99210,50460,89605,177424,142901,176464,160625,38201,112101,4048,1716,167599,140883,158250,175399,

>
>
>     example above contains 4 user's data and each number is nominal 
> (denoting that is a kind of behavior of user, e.s, user 2 has 
> "98660","158620","33900" )
>
>     Please tell me how to work on that or which documents should I read..
>
>
>     Thx!
>
>    Don Tan



Mime
View raw message