spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Cutler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-15497) DecisionTreeClassificationModel can't be saved within in Pipeline caused by not implement Writable
Date Tue, 24 May 2016 20:33:12 GMT

    [ https://issues.apache.org/jira/browse/SPARK-15497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298837#comment-15298837
] 

Bryan Cutler commented on SPARK-15497:
--------------------------------------

This was added in SPARK-11888 and will be in Spark 2.0.

> DecisionTreeClassificationModel can't be saved within in  Pipeline caused by not implement
Writable 
> ----------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-15497
>                 URL: https://issues.apache.org/jira/browse/SPARK-15497
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 1.6.1
>            Reporter: lichenglin
>             Fix For: 2.0.0
>
>
> Here is my code
> {code}
> SQLContext sqlContext = getSQLContext();
> 		DataFrame data = sqlContext.read().format("libsvm").load("file:///E:/workspace-mars/bigdata/sparkjob/data/mllib/sample_libsvm_data.txt");
> 		// Index labels, adding metadata to the label column.
> 		// Fit on whole dataset to include all labels in index.
> 		StringIndexerModel labelIndexer = new StringIndexer()
> 		  .setInputCol("label")
> 		  .setOutputCol("indexedLabel")
> 		  .fit(data);
> 		// Automatically identify categorical features, and index them.
> 		VectorIndexerModel featureIndexer = new VectorIndexer()
> 		  .setInputCol("features")
> 		  .setOutputCol("indexedFeatures")
> 		  .setMaxCategories(4) // features with > 4 distinct values are treated as continuous
> 		  .fit(data);
> 		// Split the data into training and test sets (30% held out for testing)
> 		DataFrame[] splits = data.randomSplit(new double[]{0.7, 0.3});
> 		DataFrame trainingData = splits[0];
> 		DataFrame testData = splits[1];
> 		// Train a DecisionTree model.
> 		DecisionTreeClassifier dt = new DecisionTreeClassifier()
> 		  .setLabelCol("indexedLabel")
> 		  .setFeaturesCol("indexedFeatures");
> 		// Convert indexed labels back to original labels.
> 		IndexToString labelConverter = new IndexToString()
> 		  .setInputCol("prediction")
> 		  .setOutputCol("predictedLabel")
> 		  .setLabels(labelIndexer.labels());
> 		// Chain indexers and tree in a Pipeline
> 		Pipeline pipeline = new Pipeline()
> 		  .setStages(new PipelineStage[]{labelIndexer, featureIndexer, dt, labelConverter});
> 		// Train model.  This also runs the indexers.
> 		PipelineModel model = pipeline.fit(trainingData);
> 		model.save("file:///e:/tmpmodel");
> {code}
> and here is the exception
> {code}
> Exception in thread "main" java.lang.UnsupportedOperationException: Pipeline write will
fail on this Pipeline because it contains a stage which does not implement Writable. Non-Writable
stage: dtc_7bdeae1c4fb8 of type class org.apache.spark.ml.classification.DecisionTreeClassificationModel
> 	at org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$validateStages$1.apply(Pipeline.scala:218)
> 	at org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$validateStages$1.apply(Pipeline.scala:215)
> 	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> 	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
> 	at org.apache.spark.ml.Pipeline$SharedReadWrite$.validateStages(Pipeline.scala:215)
> 	at org.apache.spark.ml.PipelineModel$PipelineModelWriter.<init>(Pipeline.scala:325)
> 	at org.apache.spark.ml.PipelineModel.write(Pipeline.scala:309)
> 	at org.apache.spark.ml.util.MLWritable$class.save(ReadWrite.scala:131)
> 	at org.apache.spark.ml.PipelineModel.save(Pipeline.scala:280)
> 	at com.bjdv.spark.job.Testjob.main(Testjob.java:142)
> {code}
> sample_libsvm_data.txt is included in the 1.6.1 release tar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message