spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jatinpreet <>
Subject Naive Baye's classification confidence
Date Thu, 20 Nov 2014 05:42:04 GMT
I have been trying the Naive Baye's implementation of Spark's MLlib.During
testing phase, I wish to eliminate data with low confidence of prediction.

My data set primarily consists of form based documents like reports and
application forms. They contain key-value pair type text and hence I assume
the independence condition holds better than with natural language.

About the quality of priors, I am not doing anything special. I am training
more or less equal number of samples for each class and have left the heavy
lifting to be done by MLlib.

Given these facts, does it make sense to have confidence thresholds defined
for each category above which I will get correct results consistently?


Novice Big Data Programmer
View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message