I don't see any issue in top terms having similar frequencies. Cosine
distance measure is considered to be a good distance measure for text data.
On Mon, Oct 8, 2012 at 10:35 AM, jung hoon sohn <jsohn57@gmail.com> wrote:
> Thank you for the information.
> Following your answer, the top terms from the clusters have similar
> frequencies.
> As I used the cosine distance as the measure is this correct result?
>
> Thank You.
>
> Jung Hoon Sohn
>
> On Sun, Oct 7, 2012 at 9:35 PM, paritosh ranjan
> <paritoshranjan5@gmail.com>wrote:
>
> > The top terms come from the centroid of the cluster. These values are the
> > term frequencies.
> >
> > On Sun, Oct 7, 2012 at 5:38 PM, jung hoon sohn <jsohn57@gmail.com>
> wrote:
> >
> > > Hello,
> > > I used kmeans algorithm to cluster the text terms in the documents
> > > according to the cosine distance measure.
> > > It ran successfully and when we ran the clusterdump utility to see the
> > top
> > > terms per each clusters,
> > > I get the output such as
> > >
> > > Top Terms:
> > >
> > > hello => 21.8977799999
> > > you => 11.9284304939
> > > ....
> > >
> > > I am guessing the value next to the each terms are cosine distance
> values
> > > but not very sure about it.
> > > Does anyone know specifically what does the value represent?
> > >
> > > Thanks.
> > >
> > > Jung Hoon Sohn
> > >
> >
>
