mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From paritosh ranjan <paritoshranj...@gmail.com>
Subject Re: RepresentativePointsDriver numIterations
Date Thu, 01 Nov 2012 17:36:40 GMT
If you see the intra cluster distance to be small at 10 iterations, then
you know that 10 is not something that you needed, lesser would have been
fine ( but useless now ). However, if there are around 500 points per
cluster, with a very small intra cluster distance, then you might think
that 10 is fine ( here it can help ). So, this is something which can be
tried and tested. It can be looked as trying things before locking on a
representation in my view.

Looking at the max intercluster distance, min intercluster distance and
average intercluster distance can also give you some idea about the
clusters. If the inter cluster distances are large, then also you might not
need too many iterations. But, again it depends on what information are you
trying to gather.

In my opinion, some leaps can be taken based on these parameters, before
jumping on the final representation points. I don't think all parameters
can be finalized in the beginning. My advice would be to try to use the
parameters based on the problem you are trying to solve. To me, it looks
like a heuristic process.

On Thu, Nov 1, 2012 at 10:47 PM, Rahul Mishra <mishra.rahulk@gmail.com>wrote:

> But we need to set the iterations before calculating intracluster distance.
> I presume,  only after we call the RepresenterPointsDriver.run() we would
> be  able to get the intra cluster distance.   I am not sure how is it going
> to help.
>
>
> On Thu, Nov 1, 2012 at 9:41 PM, paritosh ranjan
> <paritoshranjan5@gmail.com>wrote:
>
> > If the intra cluster distance is small ( which means the vectors are
> > tightly clustered ), then you might not need a lot of iterations to
> > represent it.
> > Similarly, if there are very few vectors per cluster, and the intra
> cluster
> > distance is also small, then even a single iteration would be fine.
>  Thats
> > how I see it.
> >
> > On Thu, Nov 1, 2012 at 9:12 PM, Rahul Mishra <mishra.rahulk@gmail.com
> > >wrote:
> >
> > > Thanks for the prompt reply Paritosh.
> > > Could you please explain it a bit further? How does it depend?
> > >
> > > Thanks & Regards,
> > > Rahul
> > >
> > >
> > > On Thu, Nov 1, 2012 at 8:44 PM, paritosh ranjan
> > > <paritoshranjan5@gmail.com>wrote:
> > >
> > > > Each iteration will add a single point to the evolving list of
> > > > representative points for each cluster.
> > > > So, I think it depends on the number of vectors per cluster and also
> > the
> > > > intra cluster distance.
> > > >
> > > > On Thu, Nov 1, 2012 at 8:13 PM, Rahul Mishra <
> mishra.rahulk@gmail.com
> > > > >wrote:
> > > >
> > > > > Hello Friends,
> > > > >
> > > > > Whats the heuristic for providing what number of iterations for
> > > > > RepresentativePointsDriver?
> > > > >
> > > > > I have run kmeans and fuzzy-kmeans algorithm on a dataset of size
> > > 500MB.
> > > > > Now, how do I obtain cluster quality?
> > > > >
> > > > > Does the following look Okay? :
> > > > > RepresentativePointsDriver.run(conf, new Path(clustersIn), new
> > > > > Path(clusteredPointsIn), new Path(outputDir), new
> > > > > EuclideanDistanceMeasure(), numIterations, runSequential);
> > > > > double interDis = clusterEval.interClusterDensity();
> > > > > double intraDis = clusterEval.intraClusterDensity();
> > > > > System.out.println("cluster evaluator: The inter distance:
> > "+interDis);
> > > > > System.out.println("cluster evaluator: The intra distance:
> > "+intraDis);
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > > Rahul K Mishra,
> > > > > https://sites.google.com/site/reachrahulkmishra/
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Rahul K Mishra,
> > > https://sites.google.com/site/reachrahulkmishra/
> > >
> >
>
>
>
> --
> Regards,
> Rahul K Mishra,
> https://sites.google.com/site/reachrahulkmishra/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message