Thank you Sean. I'll try to do it externally as you suggested, however, can
you please give me some hints on how to do that? In fact, where can I find
the 1.2 implementation you just mentioned? Thanks!
On Wed, Oct 8, 2014 at 12:58 PM, Sean Owen <sowen@cloudera.com> wrote:
> Plain old SVMs don't produce an estimate of class probabilities;
> predict_proba() does some additional work to estimate class
> probabilities from the SVM output. Spark does not implement this right
> now.
>
> Spark implements the equivalent of decision_function (the wTx + b bit)
> but does not expose it, and instead gives you predict(), which gives 0
> or 1 depending on whether the decision function exceeds the specified
> threshold.
>
> Yes you can roll your own just like you did to calculate the decision
> function from weights and intercept. I suppose it would be nice to
> expose it (do I hear a PR?) but it's not hard to do externally. You'll
> have to do this anyway if you're on anything earlier than 1.2.
>
> On Wed, Oct 8, 2014 at 10:17 AM, Adamantios Corais
> <adamantios.corais@gmail.com> wrote:
> > ok let me rephrase my question once again. pythonwise I am preferring
> > .predict_proba(X) instead of .decision_function(X) since it is easier
> for me
> > to interpret the results. as far as I can see, the latter functionality
> is
> > already implemented in Spark (well, in version 0.9.2 for example I have
> to
> > compute the dot product on my own otherwise I get 0 or 1) but the former
> is
> > not implemented (yet!). what should I do \ how to implement that one in
> > Spark as well? what are the required inputs here and how does the formula
> > look like?
> >
> > On Tue, Oct 7, 2014 at 10:04 PM, Sean Owen <sowen@cloudera.com> wrote:
> >>
> >> It looks like you are directly computing the SVM decision function in
> >> both cases:
> >>
> >> val predictions2 = m_users_double.map{point=>
> >> point.zip(weights).map(a=> a._1 * a._2).sum + intercept
> >> }.cache()
> >>
> >> clf.decision_function(T)
> >>
> >> This does not give you +1/1 in SVMs (well... not for most points,
> >> which will be outside the margin around the separating hyperplane).
> >>
> >> You can use the predict() function in SVMModel  which will give you
> >> 0 or 1 (rather than +/ 1 but that's just differing convention)
> >> depending on the sign of the decision function. I don't know if this
> >> was in 0.9.
> >>
> >> At the moment I assume you saw small values of the decision function
> >> in scikit because of the radial basis function.
>
