spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Pomfret <nick-nab...@snowmonkey.co.uk>
Subject Re: Using SVMWithSGD model to predict
Date Sun, 19 Oct 2014 21:58:32 GMT
Thanks for the info.

On 19 October 2014 20:46, Sean Owen <sowen@cloudera.com> wrote:

> Ah right. It is important to use clearThreshold() in that example in
> order to generate margins, because the AUC metric needs the
> classifications to be ranked by some relative strength, rather than
> just 0/1. These outputs are not probabilities, and that is not what
> SVMs give you in general. There are techniques for estimating
> probabilities from SVM output but these aren't present here.
>
> If you just want 0/1, you do not want to call clearThreshold().
>
> Linear regression is not a classifier so probabilities don't enter
> into it. Logistic regression however does give you a probability if
> you compute the logistic function of the input directly.
>
> On Sun, Oct 19, 2014 at 3:00 PM, Nick Pomfret
> <nick-nabble@snowmonkey.co.uk> wrote:
> > Thanks.
> >
> > The example I used is here
> > https://spark.apache.org/docs/latest/mllib-linear-methods.html see
> > SVMClassifier
> >
> > So there's no way to get a probability based output?  What about from
> linear
> > regression, or logistic regression?
> >
> > On 19 October 2014 19:52, Sean Owen <sowen@cloudera.com> wrote:
> >>
> >> The problem is that you called clearThreshold(). The result becomes the
> >> SVM margin not a 0/1 class prediction. There is no probability output.
> >>
> >> There was a very similar question last week. Is there an example out
> there
> >> suggesting clearThreshold()? I also wonder if it is good to overload the
> >> meaning of the output indirectly this way.
> >>
> >> On Oct 19, 2014 6:53 PM, "npomfret" <nick-nabble@snowmonkey.co.uk>
> wrote:
> >>>
> >>> Hi, I'm new to spark and just trying to make sense of the SVMWithSGD
> >>> example. I ran my dataset through it and build a model. When I call
> >>> predict() on the testing data (after clearThreshold()) I was expecting
> to
> >>> get answers in the range of 0 to 1. But they aren't, all predictions
> seem to
> >>> be negative numbers between -0 and -2. I guess my question is what do
> these
> >>> predictions mean? How are they of use? The outcome I need is a
> probability
> >>> rather than a binary. Here's my java code: SparkConf conf = new
> SparkConf()
> >>> .setAppName("name") .set("spark.cores.max", "1"); JavaSparkContext sc
> = new
> >>> JavaSparkContext(conf); JavaRDD points = sc.textFile(path).map(new
> >>> ParsePoint()).cache(); JavaRDD training = points.sample(false, 0.8,
> >>> 0L).cache(); JavaRDD testing = points.subtract(training); SVMModel
> model =
> >>> SVMWithSGD.train(training.rdd(), 100); model.clearThreshold(); for
> >>> (LabeledPoint point : testing.toArray()) { Double score =
> >>> model.predict(point.features()); System.out.println("score = " +
> score);//<-
> >>> all these are negative numbers, seemingly between 0 and -2 }
> >>> ________________________________
> >>> View this message in context: Using SVMWithSGD model to predict
> >>> Sent from the Apache Spark User List mailing list archive at
> Nabble.com.
> >
> >
>

Mime
View raw message