mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Svetlomir Kasabov <>
Subject Re: Probabilities in Bayesian classifier
Date Wed, 15 Jun 2011 16:56:36 GMT
Hello Steven,

I've asked this question too:

unfortunately, Mahout's Naive Bayes implemention can't calculate 
probabilities. You are now probably really astonished - I could'nt 
believe it too, as I read that (I think this is some kind of 'strange', 
since Bayes's main concept is probability calculation). It's a pitty, 
that such a great framework like Mahout has restricted the Bayesian 
concept that way. In addition, Naive Bayes is (as far as I know) only 
text-oriented, you can apply it only on documents . Mahout is still 
wonderful, though, because it lets us calculate probabilities using 
Logistic Regression.

That's why I switched to using Mahout's Logistic Regression 
implementation: using 
returns a probability. Logistic Regression has also the advantage, that 
it can handle continous values directly, while in Bayes' Clasifier you 
should categorize data first.

You can try the class from the mahout-examples in 
order to see how it works. See also the calculation of probability in

double p = lr.classifyScalar(input);

Am 15.06.2011 16:51, schrieb Steven Raemaekers:
> Hello,
> Currently I'm working on a classifier to classify documents written in different programming
languages in the correct category. I made a test and a training set, and I get a confusion
table as a result. This is nice, but the program does not supply any probabilities/uncertainties
that a particular file belongs to a certain category, it only returns whether or not a single
file belongs to a category or not. Because it is a Bayesian algorithm, probabilities must
be involved somehow.
> What I would like to have is for a single input file the chance/probability of that file
belonging to each category, for instance like this:
> C: 25%
> C++: 50%
> Java: 25%
> The classifyDocument method in the class BayesAlgorithm does return numbers, but these
are not really probabilities since they do not add up to 1.
> Looking in the javadoc it says that these numbers are dot products between the vector
of this document and the training set.
> So my question is, is it possible to convert the numbers as stored in ClassifierResult
and calculated in BayesAlgorithm.classifyDocument to some kind of probability?
> Regards,
> Steven
> --
> Software Improvement Group
> We would like to invite you to complete our survey on the Awareness of Green Software.
> It will take you less than 10 minutes.
> Link to survey:

View raw message