spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Pentreath <nick.pentre...@gmail.com>
Subject Re: Deploying ML Pipeline Model
Date Fri, 01 Jul 2016 17:47:35 GMT
Generally there are 2 ways to use a trained pipeline model - (offline)
batch scoring, and real-time online scoring.

For batch (or even "mini-batch" e.g. on Spark streaming data), then yes
certainly loading the model back in Spark and feeding new data through the
pipeline for prediction works just fine, and this is essentially what is
supported in 1.6 (and more or less full coverage in 2.0). For large batch
cases this can be quite efficient.

However, usually for real-time use cases, the latency required is fairly
low - of the order of a few ms to a few 100ms for a request (some examples
include recommendations, ad-serving, fraud detection etc).

In these cases, using Spark has 2 issues: (1) latency for prediction on the
pipeline, which is based on DataFrames and therefore distributed execution,
is usually fairly high "per request"; (2) this requires pulling in all of
Spark for your real-time serving layer (or running a full Spark cluster),
which is usually way too much overkill - all you really need for serving is
a bit of linear algebra and some basic transformations.

So for now, unfortunately there is not much in the way of options for
exporting your pipelines and serving them outside of Spark - the
JPMML-based project mentioned on this thread is one option. The other
option at this point is to write your own export functionality and your own
serving layer.

There is (very initial) movement towards improving the local serving
possibilities (see https://issues.apache.org/jira/browse/SPARK-13944 which
was the "first step" in this process).

On Fri, 1 Jul 2016 at 19:24 Jacek Laskowski <jacek@japila.pl> wrote:

> Hi Rishabh,
>
> I've just today had similar conversation about how to do a ML Pipeline
> deployment and couldn't really answer this question and more because I
> don't really understand the use case.
>
> What would you expect from ML Pipeline model deployment? You can save
> your model to a file by model.write.overwrite.save("model_v1").
>
> model_v1
> |-- metadata
> |   |-- _SUCCESS
> |   `-- part-00000
> `-- stages
>     |-- 0_regexTok_b4265099cc1c
>     |   `-- metadata
>     |       |-- _SUCCESS
>     |       `-- part-00000
>     |-- 1_hashingTF_8de997cf54ba
>     |   `-- metadata
>     |       |-- _SUCCESS
>     |       `-- part-00000
>     `-- 2_linReg_3942a71d2c0e
>         |-- data
>         |   |-- _SUCCESS
>         |   |-- _common_metadata
>         |   |-- _metadata
>         |   `--
> part-r-00000-2096c55a-d654-42b2-90d3-5a310101cba5.gz.parquet
>         `-- metadata
>             |-- _SUCCESS
>             `-- part-00000
>
> 9 directories, 12 files
>
> What would you like to have outside SparkContext? What's wrong with
> using Spark? Just curious hoping to understand the use case better.
> Thanks.
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Fri, Jul 1, 2016 at 12:54 PM, Rishabh Bhardwaj <rbnext29@gmail.com>
> wrote:
> > Hi All,
> >
> > I am looking for ways to deploy a ML Pipeline model in production .
> > Spark has already proved to be a one of the best framework for model
> > training and creation, but once the ml pipeline model is ready how can I
> > deploy it outside spark context ?
> > MLlib model has toPMML method but today Pipeline model can not be saved
> to
> > PMML. There are some frameworks like MLeap which are trying to abstract
> > Pipeline Model and provide ML Pipeline Model deployment outside spark
> > context,but currently they don't have most of the ml transformers and
> > estimators.
> > I am looking for related work going on this area.
> > Any pointers will be helpful.
> >
> > Thanks,
> > Rishabh.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Mime
View raw message