mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From paritosh ranjan <paritoshranj...@gmail.com>
Subject Re: Clusterdump Output Question
Date Mon, 08 Oct 2012 06:56:59 GMT
I don't see any issue in top terms having similar frequencies. Cosine
distance measure is considered to be a good distance measure for text data.

On Mon, Oct 8, 2012 at 10:35 AM, jung hoon sohn <jsohn57@gmail.com> wrote:

> Thank you for the information.
> Following your answer, the top terms from the clusters have similar
> frequencies.
> As I used the cosine distance as the measure is this correct result?
>
> Thank You.
>
> Jung Hoon Sohn
>
> On Sun, Oct 7, 2012 at 9:35 PM, paritosh ranjan
> <paritoshranjan5@gmail.com>wrote:
>
> > The top terms come from the centroid of the cluster. These values are the
> > term frequencies.
> >
> > On Sun, Oct 7, 2012 at 5:38 PM, jung hoon sohn <jsohn57@gmail.com>
> wrote:
> >
> > > Hello,
> > > I used k-means algorithm to cluster the text terms in the documents
> > > according to the cosine distance measure.
> > > It ran successfully and when we ran the clusterdump utility to see the
> > top
> > > terms per each clusters,
> > > I get the output such as
> > >
> > >       Top Terms:
> > >
> > >             hello    =>     21.8977799999
> > >             you     =>     11.9284304939
> > >             ....
> > >
> > > I am guessing the value next to the each terms are cosine distance
> values
> > > but not very sure about it.
> > > Does anyone know specifically what does the value represent?
> > >
> > > Thanks.
> > >
> > > Jung Hoon Sohn
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message