mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <j...@windwardsolutions.com>
Subject Re: Clustering: significance of s0,s1,s2?
Date Tue, 02 Oct 2012 00:39:35 GMT
Variables s0, s1 and s2 are for a running-sums algorithm that is used to 
compute the new center and radius (centroid and standard deviation) for 
Clusters at the end of each iteration. It is basically the 
RunningSumsGaussianAccumulator's implementation that is yet to be 
factored into a GaussianAccumulator instance so that an 
OnlineGaussianAccumulator can be substituted. The OGA is based upon 
Welford's algorithm and is more numerically stable for calculating the 
std (radius).

A JIRA issue to accomplish this refactoring and a patch to do it would 
be a great contribution for some aspiring Mahout developer.

On 10/1/12 1:06 AM, Rahul Mishra wrote:
> In the clustering code, what actually is the significance of s0, s1
> and s2? Apologies if it is a
> dumb question but I do not find any comments in the code?
>
>
> --
> Regards,
> Rahul K Mishra,
> www.ee.iitb.ac.in/student/~rahulkmishra
>
>


Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message