spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adamantios Corais <adamantios.cor...@gmail.com>
Subject Re: return probability \ confidence instead of actual class
Date Wed, 08 Oct 2014 09:17:20 GMT
ok let me rephrase my question once again. python-wise I am preferring
.predict_proba(X) instead of .decision_function(X) since it is easier for
me to interpret the results. as far as I can see, the latter functionality
is already implemented in Spark (well, in version 0.9.2 for example I have
to compute the dot product on my own otherwise I get 0 or 1) but the former
is not implemented (yet!). what should I do \ how to implement that one in
Spark as well? what are the required inputs here and how does the formula
look like?

On Tue, Oct 7, 2014 at 10:04 PM, Sean Owen <sowen@cloudera.com> wrote:

> It looks like you are directly computing the SVM decision function in
> both cases:
>
> val predictions2 = m_users_double.map{point=>
>   point.zip(weights).map(a=> a._1 * a._2).sum + intercept
> }.cache()
>
> clf.decision_function(T)
>
> This does not give you +1/-1 in SVMs (well... not for most points,
> which will be outside the margin around the separating hyperplane).
>
> You can use the predict() function in SVMModel -- which will give you
> 0 or 1 (rather than +/- 1 but that's just differing convention)
> depending on the sign of the decision function. I don't know if this
> was in 0.9.
>
> At the moment I assume you saw small values of the decision function
> in scikit because of the radial basis function.
>
> On Tue, Oct 7, 2014 at 7:45 PM, Sunny Khatri <sunny.kh03@gmail.com> wrote:
> > Not familiar with scikit SVM implementation ( and I assume you are using
> > linearSVC). To figure out an optimal decision boundary based on the
> scores
> > obtained, you can use an ROC curve varying your thresholds.
> >
>

Mime
View raw message