spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cyanny LIANG <lgrcya...@gmail.com>
Subject Spark ML Pipeline Model Persistent Support Save Schema Info
Date Sat, 23 Dec 2017 02:02:10 GMT
Hi all,
I have a project about model transformation with PMML, it needs to
transform models with different types to pmml files.
And JPMML(https://github.com/jpmml) has provided tools to do that´╝îsuch as
jpmml-sklearn, jpmml-xgboost etc. Our transformation API parameters must be
concise and simple, in other words the less the better.

I came with a issue that, sklearn, tensorflow, and lightgbm can produce
only one model file, including schema info and model data info.
but Spark PipelineModel only export a model file in parquet, there is no
schema info in the model file. However, JPMML-SPARK converter needs two
arguments: Data Schema and PipelineModel

*Can spark PipelineModel include input data schema as metadata when do
export? *

The situations about machine learning libraries to jpmml are as the
attached image, only xgboost and spark can't include schema info in
exported model file.

[image: Inline image 1]

-- 
Best & Regards
Cyanny LIANG
email: lgrcyanny@gmail.com

Mime
View raw message