[ https://issues.apache.org/jira/browse/SPARK-15497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298837#comment-15298837
]
Bryan Cutler commented on SPARK-15497:
--------------------------------------
This was added in SPARK-11888 and will be in Spark 2.0.
> DecisionTreeClassificationModel can't be saved within in Pipeline caused by not implement
Writable
> ----------------------------------------------------------------------------------------------------
>
> Key: SPARK-15497
> URL: https://issues.apache.org/jira/browse/SPARK-15497
> Project: Spark
> Issue Type: Bug
> Components: MLlib
> Affects Versions: 1.6.1
> Reporter: lichenglin
> Fix For: 2.0.0
>
>
> Here is my code
> {code}
> SQLContext sqlContext = getSQLContext();
> DataFrame data = sqlContext.read().format("libsvm").load("file:///E:/workspace-mars/bigdata/sparkjob/data/mllib/sample_libsvm_data.txt");
> // Index labels, adding metadata to the label column.
> // Fit on whole dataset to include all labels in index.
> StringIndexerModel labelIndexer = new StringIndexer()
> .setInputCol("label")
> .setOutputCol("indexedLabel")
> .fit(data);
> // Automatically identify categorical features, and index them.
> VectorIndexerModel featureIndexer = new VectorIndexer()
> .setInputCol("features")
> .setOutputCol("indexedFeatures")
> .setMaxCategories(4) // features with > 4 distinct values are treated as continuous
> .fit(data);
> // Split the data into training and test sets (30% held out for testing)
> DataFrame[] splits = data.randomSplit(new double[]{0.7, 0.3});
> DataFrame trainingData = splits[0];
> DataFrame testData = splits[1];
> // Train a DecisionTree model.
> DecisionTreeClassifier dt = new DecisionTreeClassifier()
> .setLabelCol("indexedLabel")
> .setFeaturesCol("indexedFeatures");
> // Convert indexed labels back to original labels.
> IndexToString labelConverter = new IndexToString()
> .setInputCol("prediction")
> .setOutputCol("predictedLabel")
> .setLabels(labelIndexer.labels());
> // Chain indexers and tree in a Pipeline
> Pipeline pipeline = new Pipeline()
> .setStages(new PipelineStage[]{labelIndexer, featureIndexer, dt, labelConverter});
> // Train model. This also runs the indexers.
> PipelineModel model = pipeline.fit(trainingData);
> model.save("file:///e:/tmpmodel");
> {code}
> and here is the exception
> {code}
> Exception in thread "main" java.lang.UnsupportedOperationException: Pipeline write will
fail on this Pipeline because it contains a stage which does not implement Writable. Non-Writable
stage: dtc_7bdeae1c4fb8 of type class org.apache.spark.ml.classification.DecisionTreeClassificationModel
> at org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$validateStages$1.apply(Pipeline.scala:218)
> at org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$validateStages$1.apply(Pipeline.scala:215)
> at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
> at org.apache.spark.ml.Pipeline$SharedReadWrite$.validateStages(Pipeline.scala:215)
> at org.apache.spark.ml.PipelineModel$PipelineModelWriter.<init>(Pipeline.scala:325)
> at org.apache.spark.ml.PipelineModel.write(Pipeline.scala:309)
> at org.apache.spark.ml.util.MLWritable$class.save(ReadWrite.scala:131)
> at org.apache.spark.ml.PipelineModel.save(Pipeline.scala:280)
> at com.bjdv.spark.job.Testjob.main(Testjob.java:142)
> {code}
> sample_libsvm_data.txt is included in the 1.6.1 release tar
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org
|