mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Raemaekers <>
Subject Probabilities in Bayesian classifier
Date Wed, 15 Jun 2011 14:51:07 GMT

Currently I'm working on a classifier to classify documents written in different programming
languages in the correct category. I made a test and a training set, and I get a confusion
table as a result. This is nice, but the program does not supply any probabilities/uncertainties
that a particular file belongs to a certain category, it only returns whether or not a single
file belongs to a category or not. Because it is a Bayesian algorithm, probabilities must
be involved somehow. 

What I would like to have is for a single input file the chance/probability of that file belonging
to each category, for instance like this:

C: 25%
C++: 50%
Java: 25%

The classifyDocument method in the class BayesAlgorithm does return numbers, but these are
not really probabilities since they do not add up to 1. 
Looking in the javadoc it says that these numbers are dot products between the vector of this
document and the training set.  

So my question is, is it possible to convert the numbers as stored in ClassifierResult and
calculated in BayesAlgorithm.classifyDocument to some kind of probability? 



Software Improvement Group

We would like to invite you to complete our survey on the Awareness of Green Software. 
It will take you less than 10 minutes. 
Link to survey:

View raw message