flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] azagrebin commented on a change in pull request #6425: [FLINK-9664][Doc] fixing documentation in ML quick start
Date Wed, 08 Aug 2018 13:22:26 GMT
azagrebin commented on a change in pull request #6425: [FLINK-9664][Doc] fixing documentation
in ML quick start
URL: https://github.com/apache/flink/pull/6425#discussion_r208576999

 File path: docs/dev/libs/ml/quickstart.md
 @@ -129,6 +129,10 @@ and the [test set here](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/b
 This is an astroparticle binary classification dataset, used by Hsu et al. [[3]](#hsu) in
 practical Support Vector Machine (SVM) guide. It contains 4 numerical features, and the class
+Before importing the traning and test dataset, Flink SVM only supports threshold binary values
+`+1.0` and `-1.0`. Thus a conversion is needed upon downloading the svmguide1 dataset since
it is 
+labelled using `1`s and `0`s.
 Review comment:
   I think this section belongs to the beginning of the next one `Classification`, because
this one is about LibSVM format.
   The code example of conversion could be also provided to make the example fully 'copy-paste'
   Small thing is also typo in `traning` -> `training`.
   I would suggest to modify the code example in this `LibSVM files` section like this:
   val astroTrainLibSVM: DataSet[LabeledVector] = MLUtils.readLibSVM(env, "/path/to/svmguide1")
   val astroTestLibSVM: DataSet[LabeledVector] = MLUtils.readLibSVM(env, "/path/to/svmguide1.t")
   to have no SVM training specifics here, and add something like this to the beginning of
the `Classification` section:
   _... After importing the training and test dataset, the data needs to be prepared for the
classification, because Flink SVM only supports ... conversion is needed after downloading
   And then the code example:
   def svmNormaliser : LabeledVector => LabeledVector =
       lv => LabeledVector(if (lv.label > 0.0) 1.0 else -1.0, lv.vector)
   val astroTrain: DataSet[LabeledVector] = astroTrainLibSVM.map(svmNormaliser)
   val astroTest: DataSet[(Vector, Double)] = astroTestLibSVM.map(svmNormaliser).map(x =>
(x.vector, x.label))

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

With regards,
Apache Git Services

View raw message