spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yuhao Yang <hhb...@gmail.com>
Subject Re: scikit-learn and mllib difference in predictions python
Date Sun, 25 Dec 2016 20:29:24 GMT
Hi ioanna,

I'd like to help look into it. Is there a way to access your training data?

2016-12-20 17:21 GMT-08:00 ioanna <gianna--@hotmail.com>:

> I have an issue with an SVM model trained for binary classification using
> Spark 2.0.0.
> I have followed the same logic using scikit-learn and MLlib, using the
> exact
> same dataset.
> For scikit learn I have the following code:
>
>     svc_model = SVC()
>     svc_model.fit(X_train, y_train)
>
>     print "supposed to be 1"
>     print svc_model.predict([15 ,15,0,15,15,4,12,8,0,7])
>     print
> svc_model.predict([15.0,15.0,15.0,7.0,7.0,15.0,15.0,0.0,12.0,15.0])
>     print svc_model.predict([15.0,15.0,7.0,0.0,7.0,0.0,15.0,15.0,15.
> 0,15.0])
>     print svc_model.predict([7.0,0.0,15.0,15.0,15.0,15.0,7.0,7.0,15.0,
> 15.0])
>
>     print "supposed to be 0"
>     print svc_model.predict([18.0, 15.0, 7.0, 7.0, 15.0, 0.0, 15.0, 15.0,
> 15.0, 15.0])
>     print svc_model.predict([ 11.0,13.0,7.0,10.0,7.0,13.0,7.
> 0,19.0,7.0,7.0])
>     print svc_model.predict([ 15.0, 15.0, 18.0, 7.0, 15.0, 15.0, 15.0,
> 18.0,
> 7.0, 15.0])
>     print svc_model.predict([ 15.0, 15.0, 8.0, 0.0, 0.0, 8.0, 15.0, 15.0,
> 15.0, 7.0])
>
>
> and it returns:
>
>     supposed to be 1
>     [0]
>     [1]
>     [1]
>     [1]
>     supposed to be 0
>     [0]
>     [0]
>     [0]
>     [0]
>
> For spark am doing:
>
>     model_svm = SVMWithSGD.train(trainingData, iterations=100)
>
>     model_svm.clearThreshold()
>
>     print "supposed to be 1"
>     print
> model_svm.predict(Vectors.dense(15.0,15.0,0.0,15.0,15.0,
> 4.0,12.0,8.0,0.0,7.0))
>     print
> model_svm.predict(Vectors.dense(15.0,15.0,15.0,7.0,7.0,
> 15.0,15.0,0.0,12.0,15.0))
>     print
> model_svm.predict(Vectors.dense(15.0,15.0,7.0,0.0,7.0,0.
> 0,15.0,15.0,15.0,15.0))
>     print
> model_svm.predict(Vectors.dense(7.0,0.0,15.0,15.0,15.0,
> 15.0,7.0,7.0,15.0,15.0))
>
>     print "supposed to be 0"
>     print model_svm.predict(Vectors.dense(18.0, 15.0, 7.0, 7.0, 15.0, 0.0,
> 15.0, 15.0, 15.0, 15.0))
>     print
> model_svm.predict(Vectors.dense(11.0,13.0,7.0,10.0,7.0,
> 13.0,7.0,19.0,7.0,7.0))
>     print model_svm.predict(Vectors.dense(15.0, 15.0, 18.0, 7.0, 15.0,
> 15.0,
> 15.0, 18.0, 7.0, 15.0))
>     print model_svm.predict(Vectors.dense(15.0, 15.0, 8.0, 0.0, 0.0, 8.0,
> 15.0, 15.0, 15.0, 7.0))
>
> which returns:
>
>     supposed to be 1
>     12.8250120159
>     16.0786937313
>     14.2139435305
>     16.5115589658
>     supposed to be 0
>     17.1311777004
>     14.075461697
>     20.8883372052
>     12.9132580999
>
> when I am setting the threshold I am either getting all zeros or all ones.
>
> Does anyone know how to approach this problem?
>
> As I said I have checked multiple times that my dataset and feature
> extraction logic are exactly the same in both cases.
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/scikit-learn-and-mllib-difference-in-
> predictions-python-tp28240.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Mime
View raw message