spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Liquan Pei <liquan...@gmail.com>
Subject Re: return probability \ confidence instead of actual class
Date Mon, 22 Sep 2014 06:50:49 GMT
HI Adamantios,

For your first question, after you train the SVM, you get a model with a
vector of weights w and an intercept b, point x such that  w.dot(x) + b = 1
and w.dot(x) + b = -1 are points that on the decision boundary. The
quantity w.dot(x) + b for point x is a confidence measure of
classification.

Code wise, suppose you trained your model via
val model = SVMWithSGD.train(...)

and you can set a threshold by calling

model.setThreshold(your threshold here)

to set the threshold that separate positive predictions from negative
predictions.

For more info, please take a look at
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.classification.SVMModel

For your second question, SVMWithSGD only supports binary classification.

Hope this helps,

Liquan

On Sun, Sep 21, 2014 at 11:22 PM, Adamantios Corais <
adamantios.corais@gmail.com> wrote:

> Nobody?
>
> If that's not supported already, can please, at least, give me a few hints
> on how to implement it?
>
> Thanks!
>
>
> On Fri, Sep 19, 2014 at 7:43 PM, Adamantios Corais <
> adamantios.corais@gmail.com> wrote:
>
>> Hi,
>>
>> I am working with the SVMWithSGD classification algorithm on Spark. It
>> works fine for me, however, I would like to recognize the instances that
>> are classified with a high confidence from those with a low one. How do we
>> define the threshold here? Ultimately, I want to keep only those for which
>> the algorithm is very *very* certain about its its decision! How to do
>> that? Is this feature supported already by any MLlib algorithm? What if I
>> had multiple categories?
>>
>> Any input is highly appreciated!
>>
>
>


-- 
Liquan Pei
Department of Physics
University of Massachusetts Amherst

Mime
View raw message