spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Minudika Malshan <minudika...@gmail.com>
Subject Re: How to save spark-ML model in Java?
Date Thu, 19 Jan 2017 18:31:45 GMT
Hi,

Thanks Rezaul and Asher Krim.

The method suggested by Rezaul works fine for NaiveBayes but still fails
for RandomForest and Multi-layer perceptron classifier.
Everything properly is saved until this stage.

CrossValidator cv = new CrossValidator()
        .setEstimator(pipeline)
        .setEvaluator(evaluator)
        .setEstimatorParamMaps(paramGrid)
        .setNumFolds(folds);

Any idea on how to resolve this?





On Thu, Jan 12, 2017 at 9:13 PM, Asher Krim <akrim@hubspot.com> wrote:

> What version of Spark are you on?
> Although it's cut off, I think your error is with RandomForestClassifier,
> is that correct? If so, you should upgrade to spark 2 since I think this
> class only became writeable/readable in Spark 2 (
> https://github.com/apache/spark/pull/12118)
>
> On Thu, Jan 12, 2017 at 8:43 AM, Md. Rezaul Karim <
> rezaul.karim@insight-centre.org> wrote:
>
>> Hi Malshan,
>>
>> The error says that one (or more) of the estimators/stages is either not
>> writable or compatible that supports overwrite/model write operation.
>>
>> Suppose you want to configure an ML pipeline consisting of three stages
>> (i.e. estimator): tokenizer, hashingTF, and nb:
>>     val nb = new NaiveBayes().setSmoothing(0.00001)
>>     val tokenizer = new Tokenizer().setInputCol("label
>> ").setOutputCol("label")
>>     val hashingTF = new HashingTF().setInputCol(tokeni
>> zer.getOutputCol).setOutputCol("features")
>>     val pipeline = new Pipeline().setStages(Array(tokenizer, hashingTF,
>> nb))
>>
>>
>> Now check if all the stages are writable. And to make it ease try saving
>> stages individually:  -e.g. tokenizer.write.save("path")
>>
>>
>> hashingTF.write.save("path")
>> After that suppose you want to perform a 10-fold cross-validation as
>> follows:
>>     val cv = new CrossValidator()
>>               .setEstimator(pipeline)
>>               .setEvaluator(new BinaryClassificationEvaluator)
>>               .setEstimatorParamMaps(paramGrid)
>>               .setNumFolds(10)
>>
>> Where:
>>     val paramGrid = new ParamGridBuilder()
>>                             .addGrid(hashingTF.numFeatures, Array(10,
>> 100, 1000))
>>                             .addGrid(nb.smoothing, Array(0.001, 0.0001))
>>                             .build()
>>
>> Now the model that you trained using the training set should be writable
>> if all of the stages are okay:
>>     val model = cv.fit(trainingData)
>>     model.write.overwrite().save("output/NBModel")
>>
>>
>>
>> Hope that helps.
>>
>>
>>
>>
>>
>>
>>
>> Regards,
>> _________________________________
>> *Md. Rezaul Karim*, BSc, MSc
>> PhD Researcher, INSIGHT Centre for Data Analytics
>> National University of Ireland, Galway
>> IDA Business Park, Dangan, Galway, Ireland
>> Web: http://www.reza-analytics.eu/index.html
>> <http://139.59.184.114/index.html>
>>
>> On 12 January 2017 at 09:09, Minudika Malshan <minudika001@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> When I try to save a pipeline model using spark ML (Java) , the
>>> following exception is thrown.
>>>
>>>
>>> java.lang.UnsupportedOperationException: Pipeline write will fail on
>>> this Pipeline because it contains a stage which does not implement
>>> Writable. Non-Writable stage: rfc_98f8c9e0bd04 of type class
>>> org.apache.spark.ml.classification.Rand
>>>
>>>
>>> Here is my code segment.
>>>
>>>
>>> model.write().overwrite,save
>>>
>>>
>>> model.write().overwrite().save("path
>>> model.write().overwrite().save("mypath");
>>>
>>>
>>> How to resolve this?
>>>
>>> Thanks and regards!
>>>
>>> Minudika
>>>
>>>
>>
>
>
> --
> Asher Krim
> Senior Software Engineer
>



-- 
*Minudika Malshan*
Undergraduate
Department of Computer Science and Engineering
University of Moratuwa
Sri Lanka.
<https://lk.linkedin.com/pub/minudika-malshan/100/656/a80>

Mime
View raw message