spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Pentreath <nick.pentre...@gmail.com>
Subject Re: Model Persistence
Date Thu, 18 Aug 2016 17:29:10 GMT
Model metadata (mostly parameter values) are usually tiny. The parquet data
is most often for model coefficients. So this depends on the size of your
model, i.e. Your feature dimension.

A high-dimensional linear model can be quite large - but still typically
easy to fit into main memory on a single node. A high-dimensional
multi-layer perceptron with many layers could be quite a lot larger. An ALS
model with millions of users &I items could be quite huge.

On Thu, 18 Aug 2016 at 18:00, Rich Tarro <richtarro@gmail.com> wrote:

> The following Databricks blog on Model Persistence states "Internally, we
> save the model metadata and parameters as JSON and the data as Parquet."
>
>
> https://databricks.com/blog/2016/05/31/apache-spark-2-0-preview-machine-learning-model-persistence.html
>
>
> What data associated with a model or Pipeline is actually saved (in
> Parquet format)?
>
> What factors determine how large the the saved model or pipeline will be?
>
> Thanks.
> Rich
>

Mime
View raw message