mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yash Sharma <yash...@gmail.com>
Subject Re: Question Regarding Entropy calculation in Mahout
Date Fri, 23 May 2014 18:21:19 GMT
Hi Darshan,
What i understand from your problem is that:
- You have clustered few documents
- You want to verify the accuracy of ur clustering , and you want to use
entropy for that
- You are not sure what should be the input for entropy calculation.

Possible solution:
The entropy would expect a String[] to calculate the information contained
in the data/sequence.
One simplest way is to keep all the documents labelled with categories.
- Cluster the docs as you usually do.
- For entropy calculation create a String[] for every cluster. Each array
containing all the labels of the docs in the cluster.
cluster1 = {"sports", "tech", "tech", "tech", "book", ..}
cluster2 = {"sports", "drama", "sports", "sports"...}
etc

- Calculate the entropy of each cluster.
Entropy would measure the degree of randomness of a system. High entropy
means there is high degree of randomness in a system.
Lower Entropy are desirable for validation of accuracy of your clustering
technique.

P.S. You can use Entropy.java class for your validation purpose but
its deprecated now.

Having Said that - Kindly be patient while asking questions and provide
more info on what work you have done so far with your findings. It would
enable all of us to answer quickly & correctly :)

Hope it was helpful. Other Approaches are welcome..!!

Peace,
Yash


On Fri, May 23, 2014 at 10:55 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> I am sorry, but I don't understand your questions or needs sufficiently to
> answer.
>
>
>
>
> On Wed, Apr 23, 2014 at 12:21 PM, Darshan Sonagara <
> darshan.sonagara@gmail.com> wrote:
>
> > sir please reply me as soon as possible
> > thanks in advance.
> >
> >
> > On Tue, Apr 22, 2014 at 11:50 PM, Darshan Sonagara <
> > darshan.sonagara@gmail.com> wrote:
> >
> > > waiting for the replay sir .
> > >
> > >
> > > On Tue, Apr 22, 2014 at 7:13 PM, Darshan Sonagara <
> > > darshan.sonagara@gmail.com> wrote:
> > >
> > >> Thnks for the Replay sir,
> > >>
> > >> actually i am doing clustering for gathering similar king of document
> in
> > >> same cluster as much as possible.
> > >> i can see from output file by cluster dump by observing top term.
> > >> i also figure out that by varying Distance Measure Technique. it
> > differs.
> > >> but i want some mathematical prof that it is better then other
> > technique.
> > >> so for that i need to calculate Entropy and pureness of cluster.
> > >> but i am not able to find any command in mahout which can give me
> > entropy
> > >> as a result.
> > >> i found Entropy.java under mahout common math statistic package. but i
> > >> don't what should i give it as input so that i can find entropy or
> other
> > >> parameter. so i can find how much cluster is good or bed.
> > >>
> > >>
> > >>
> > >> On Tue, Apr 22, 2014 at 7:01 PM, Ted Dunning <ted.dunning@gmail.com
> > >wrote:
> > >>
> > >>> On Tue, Apr 22, 2014 at 12:11 AM, Darshan Sonagara <
> > >>> darshan.sonagara@gmail.com> wrote:
> > >>>
> > >>> > But the problem is that i want check that whether my clustering
is
> > >>> good or
> > >>> > bad. so for that i need to calculate Entropy Value. I am not having
> > any
> > >>> > idea how to calculate entropy in mahout or by other technique.
> > >>> > by finding entropy i can have good conclusion.
> > >>> > so please can anyone help me with these.
> > >>> >
> > >>>
> > >>> Actually, the way to tell whether your clustering is good is to see
> if
> > it
> > >>> works for its intended use.
> > >>>
> > >>> What do you want to use clustering for?
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >>
> > >> *Regards From:*
> > >>
> > >> *Darshan  Sonagara*
> > >> *Collaborative Platform lead,** SSN Team | Gujarat Section.*
> > >>
> > >> *Vice-Chairperson | **GCET IEEE SB.*
> > >>
> > >> (: +*91* 9408002452
> > >>
> > >>
> > >>
> > >>  : Darshan Sonagara<
> > http://www.linkedin.com/pub/darshan-sonagara/64/11a/b54>
> > >>   : Darshan Sonagara <http://www.facebook.com/darshansonagara>
> > >>
> > >>
> > >
> > >
> > > --
> > >
> > > *Regards From:*
> > >
> > > *Darshan  Sonagara*
> > > *Collaborative Platform lead,** SSN Team | Gujarat Section.*
> > >
> > > *Vice-Chairperson | **GCET IEEE SB.*
> > >
> > > (: +*91* 9408002452
> > >
> > >
> > >
> > >  : Darshan Sonagara<
> > http://www.linkedin.com/pub/darshan-sonagara/64/11a/b54>
> > >   : Darshan Sonagara <http://www.facebook.com/darshansonagara>
> > >
> > >
> >
> >
> > --
> >
> > *Regards From:*
> >
> > *Darshan  Sonagara*
> > *Collaborative Platform lead,** SSN Team | Gujarat Section.*
> >
> > *Vice-Chairperson | **GCET IEEE SB.*
> >
> > (: +*91* 9408002452
> >
> >
> >
> >  : Darshan Sonagara<
> > http://www.linkedin.com/pub/darshan-sonagara/64/11a/b54>
> >   : Darshan Sonagara <http://www.facebook.com/darshansonagara>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message