I don't see any issue in top terms having similar frequencies. Cosine
distance measure is considered to be a good distance measure for text data.
On Mon, Oct 8, 2012 at 10:35 AM, jung hoon sohn wrote:
Thank you for the information.
Following your answer, the top terms from the clusters have similar frequencies.
frequencies.
As I used the cosine distance as the measure is this correct result?
Thank You.
Jung Hoon Sohn
On Sun, Oct 7, 2012 at 9:35 PM, paritosh ranjan wrote:
paritosh ranjan wrote:
The top terms come from the centroid of the cluster. These values are the term frequencies.
term frequencies.
On Sun, Oct 7, 2012 at 5:38 PM, jung hoon sohn wrote:
> wrote:
Hello,
I used kmeans algorithm to cluster the text terms in the documents according to the cosine distance measure.
according to the cosine distance measure.
It ran successfully and when we ran the clusterdump utility to see the top terms per each clusters,
top
terms per each clusters,
I get the output such as
> > >
Top Terms:
> > >
hello => 21.8977799999
you => 11.9284304939
....
> > >
I am guessing the value next to the each terms are cosine distance values but not very sure about it.
values
but not very sure about it.
Does anyone know specifically what does the value represent?
Thanks.
> > >
Jung Hoon Sohn
