spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anne Holler (JIRA)" <>
Subject [jira] [Commented] (SPARK-26247) SPIP - ML Model Extension for no-Spark MLLib Online Serving
Date Mon, 03 Dec 2018 22:31:00 GMT


Anne Holler commented on SPARK-26247:

Hi, [~skonto],

My basic take on model representation is that any representation that is not the same format
that the
spark mllib code produces for training and consumes for serving basically introduces additional
toil and potential risk of model serving mismatch.  In that sense, spark mllib format is
a de facto standard.

Unless PMML were to completely replace spark mllib representation as the first class citizen
representation in spark (which doesn't seem to have clear switchover ROI), the team I am
on would not
choose to move to it, because we do not want to take the risk that the model trained and
evaluated wrt spark
mllib native representation has some difference when served in batch or online mode from
PMML representation.

Best regards, Anne

> SPIP - ML Model Extension for no-Spark MLLib Online Serving
> -----------------------------------------------------------
>                 Key: SPARK-26247
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 2.1.0
>            Reporter: Anne Holler
>            Priority: Major
>              Labels: SPIP
>         Attachments: SPIPMlModelExtensionForOnlineServing.pdf
> This ticket tracks an SPIP to improve model load time and model serving interfaces for
online serving of Spark MLlib models.  The SPIP is here
> []
> The improvement opportunity exists in all versions of spark.  We developed our set of
changes wrt version 2.1.0 and can port them forward to other versions (e.g., we have ported
them forward to 2.3.2).

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message