mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gustavo Enrique Salazar Torres <gsala...@ime.usp.br>
Subject K-means in mahout
Date Fri, 24 Sep 2010 23:04:14 GMT
Hi there

This is the first time I send a message in this forum. I have a clustering
problem I want to solve. Basically I need to clusterize
a set of >1M items containing html text. At first I was thinking of using a
Lucene index and a hierarquical quadratic algorithm to find
these clusters, but we all know that quadratic performance is not good.
Although K-means complexity
is better, I would like to know production experiences using this algorithm
in Mahout, specifically time to setup a production
environment (inclusing hadoop configuration). I'm interested in this latter
issue since we have short time to come with a solution
for our problem

Thanks in advance.

Best regards.
Gustavo

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message