mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Svetlomir Kasabov <skasa...@smail.inf.fh-brs.de>
Subject Re: Probabilities in Bayesian classifier
Date Wed, 15 Jun 2011 16:56:36 GMT
Hello Steven,

I've asked this question too:

http://mail-archives.apache.org/mod_mbox/mahout-user/201105.mbox/%3CBANLkTinyohrCYnT0xzrpOqqG3ZkEPvkY0Q@mail.gmail.com%3E

unfortunately, Mahout's Naive Bayes implemention can't calculate 
probabilities. You are now probably really astonished - I could'nt 
believe it too, as I read that (I think this is some kind of 'strange', 
since Bayes's main concept is probability calculation). It's a pitty, 
that such a great framework like Mahout has restricted the Bayesian 
concept that way. In addition, Naive Bayes is (as far as I know) only 
text-oriented, you can apply it only on documents . Mahout is still 
wonderful, though, because it lets us calculate probabilities using 
Logistic Regression.

That's why I switched to using Mahout's Logistic Regression 
implementation: using OnlineLogisticRegression.java#classifyScalar() 
returns a probability. Logistic Regression has also the advantage, that 
it can handle continous values directly, while in Bayes' Clasifier you 
should categorize data first.

You can try the class TrainLogisticTest.java from the mahout-examples in 
order to see how it works. See also the calculation of probability in  
TrainLogistic.java:

double p = lr.classifyScalar(input);






Am 15.06.2011 16:51, schrieb Steven Raemaekers:
> Hello,
>
> Currently I'm working on a classifier to classify documents written in different programming
languages in the correct category. I made a test and a training set, and I get a confusion
table as a result. This is nice, but the program does not supply any probabilities/uncertainties
that a particular file belongs to a certain category, it only returns whether or not a single
file belongs to a category or not. Because it is a Bayesian algorithm, probabilities must
be involved somehow.
>
> What I would like to have is for a single input file the chance/probability of that file
belonging to each category, for instance like this:
>
> C: 25%
> C++: 50%
> Java: 25%
>
> The classifyDocument method in the class BayesAlgorithm does return numbers, but these
are not really probabilities since they do not add up to 1.
> Looking in the javadoc it says that these numbers are dot products between the vector
of this document and the training set.
>
> So my question is, is it possible to convert the numbers as stored in ClassifierResult
and calculated in BayesAlgorithm.classifyDocument to some kind of probability?
>
> Regards,
>
> Steven
>
> --
> Software Improvement Group
> www.sig.eu
>
> We would like to invite you to complete our survey on the Awareness of Green Software.
> It will take you less than 10 minutes.
> Link to survey: http://bit.ly/kfWGZM
>
>


Mime
View raw message