mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject MapReduce Stats calculations
Date Fri, 06 May 2011 13:49:36 GMT
MAHOUT-688 has a M/R job to calculate std. deviation for document frequencies so that it can
prune noisy words.  I'm thinking of making it a bit more generic and adding a stats package
to org.apache.mahout.math.hadoop that contains this and other basic stats calculations (mean,
variance, sum of squares, etc.) that operate in M/R.

Is that useful or am I re-inventing the wheel here or wasting time?  Seems like such a beast
should already exist, but a quick search didn't turn up much.

-Grant
Mime
View raw message