mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Cunnane <>
Subject Naive Bayes Classifier as a Recommender
Date Tue, 15 Oct 2013 22:00:13 GMT
Hi, I've got a dataset of millions of short documents (think twitter) that
can be in one of about 30,000 categories. When a user is creating a new
document, I want to suggest a list of 5 possible categories for that
document to go into.

Right now I'm using the Naive Bayes classifier in mahout and sorting the
results by score. My problem is that sometimes the recommender is not very
accurate and I'd like to know:

Is there any way to find out a confidence level for a classification?
Ideally then I could set a threshold and not display recommendations if the
classifier is not confident.

Also, would it be better to consider another algorithm to achieve my goals?
I chose Naive Bayes because my dataset is pure text and very large. Any
thoughts would be greatly appreciated.



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message