spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jatinpreet <>
Subject Re: Naive Baye's classification confidence
Date Thu, 20 Nov 2014 12:04:10 GMT

My last sentence didn't come out right. Let me try to explain my question

For instance, I have two categories, C1 and C2. I have trained 100 samples
for C1 and 10 samples for C2.

Now, I predict two samples one each of C1 and C2, namely S1 and S2
respectively. I get the following prediction results,

S1=> Category: C1, Probability: 0.7
S2=> Category: C2, Probability: 0.04

Now, both the predictions are correct but their probabilities are far apart.
Can I improve the prediction probability by taking the 10 samples I have of
C2 and replicating each of them 10 times making the total count equal to 100
which is same as C1.

Can I expect this to increase the probability of sample S2 after training
the new set? Is this a viable approach? 


Novice Big Data Programmer
View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message