mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: What are the best settings for my clustering task
Date Tue, 01 Oct 2013 21:08:06 GMT
At such small sizes, I would guess that the sequential version of the
streaming k-means or ball k-means would be better options.



On Mon, Sep 30, 2013 at 2:14 PM, mercutio7979 <jbonerz@googlemail.com>wrote:

> Hello all,
>
> I am currently trying create clusters from a group of 50.000 strings that
> contain product descriptions (around 70-100 characters length each).
>
> That group of 50.000 consists of roughly 5.000 individual products and ten
> varying product descriptions per product. The product descriptions are
> already prepared for clustering and contain a normalized brand name,
> product
> model number, etc.
>
> What would be a good approach to maximise the amound of found clusters (the
> best possible value would be 5.000 clusters with 10 products each)
>
> I adapted the reuters cluster script to read in my data and managed to
> create a first set of clusters. However, I have not managed to maximise the
> cluster count.
>
> The question is: what do I need to tweak with regard to the available
> mahout
> settings, so the clusters are created as precisely as possible?
>
> Many regards!
> Jens
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/What-are-the-best-settings-for-my-clustering-task-tp4092807.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message