spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Naive Baye's classification confidence
Date Thu, 20 Nov 2014 08:58:37 GMT
I assume that all examples do actually fall into exactly one of the classes.

If you always have to make a prediction then you always take the most
probable class.

If you can choose to make no classification for lack of confidence, yes you
want to pick a per-class threshold and take the most likely class from
among those that exceed the threshold.

You would have to quantify the cost of making no classification versus the
cost of making the wrong one, for each class, and pick the threshold that
equalizes them.
On Nov 20, 2014 6:43 AM, "jatinpreet" <> wrote:

> I have been trying the Naive Baye's implementation of Spark's MLlib.During
> testing phase, I wish to eliminate data with low confidence of prediction.
> My data set primarily consists of form based documents like reports and
> application forms. They contain key-value pair type text and hence I assume
> the independence condition holds better than with natural language.
> About the quality of priors, I am not doing anything special. I am training
> more or less equal number of samples for each class and have left the heavy
> lifting to be done by MLlib.
> Given these facts, does it make sense to have confidence thresholds defined
> for each category above which I will get correct results consistently?
> Thanks
> Jatin
> -----
> Novice Big Data Programmer
> --
> View this message in context:
> Sent from the Apache Spark User List mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message