spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sunny Khatri <sunny.k...@gmail.com>
Subject Re: return probability \ confidence instead of actual class
Date Wed, 24 Sep 2014 23:25:37 GMT
For multi-class you can use the same SVMWithSGD (for binary classification)
with One-vs-All approach constructing respective training corpuses
consisting one Class i as positive samples and Rest of the classes as
negative one, and then use the same method provided by Aris as a measure of
how far Class i is from the decision boundary.

On Wed, Sep 24, 2014 at 4:06 PM, Aris <arisofalaska@gmail.com> wrote:

> Χαίρε Αδαμάντιε Κοραή....έαν είναι πράγματι το όνομα
σου..
>
> Just to follow up on Liquan, you might be interested in removing the
> thresholds, and then treating the predictions as a probability from 0..1
> inclusive. SVM with the linear kernel is a straightforward linear
> classifier -- so you with the model.clearThreshold() you can just get the
> raw predicted scores, removing the threshold which simple translates that
> into a positive/negative class.
>
> API is here
> http://yhuai.github.io/site/api/scala/index.html#org.apache.spark.mllib.classification.SVMModel
>
> Enjoy!
> Aris
>
> On Sun, Sep 21, 2014 at 11:50 PM, Liquan Pei <liquanpei@gmail.com> wrote:
>
>> HI Adamantios,
>>
>> For your first question, after you train the SVM, you get a model with a
>> vector of weights w and an intercept b, point x such that  w.dot(x) + b = 1
>> and w.dot(x) + b = -1 are points that on the decision boundary. The
>> quantity w.dot(x) + b for point x is a confidence measure of
>> classification.
>>
>> Code wise, suppose you trained your model via
>> val model = SVMWithSGD.train(...)
>>
>> and you can set a threshold by calling
>>
>> model.setThreshold(your threshold here)
>>
>> to set the threshold that separate positive predictions from negative
>> predictions.
>>
>> For more info, please take a look at
>> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.classification.SVMModel
>>
>> For your second question, SVMWithSGD only supports binary classification.
>>
>> Hope this helps,
>>
>> Liquan
>>
>> On Sun, Sep 21, 2014 at 11:22 PM, Adamantios Corais <
>> adamantios.corais@gmail.com> wrote:
>>
>>> Nobody?
>>>
>>> If that's not supported already, can please, at least, give me a few
>>> hints on how to implement it?
>>>
>>> Thanks!
>>>
>>>
>>> On Fri, Sep 19, 2014 at 7:43 PM, Adamantios Corais <
>>> adamantios.corais@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am working with the SVMWithSGD classification algorithm on Spark. It
>>>> works fine for me, however, I would like to recognize the instances that
>>>> are classified with a high confidence from those with a low one. How do we
>>>> define the threshold here? Ultimately, I want to keep only those for which
>>>> the algorithm is very *very* certain about its its decision! How to do
>>>> that? Is this feature supported already by any MLlib algorithm? What if I
>>>> had multiple categories?
>>>>
>>>> Any input is highly appreciated!
>>>>
>>>
>>>
>>
>>
>> --
>> Liquan Pei
>> Department of Physics
>> University of Massachusetts Amherst
>>
>
>

Mime
View raw message