spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saurabh Sardeshpande <saurabh...@gmail.com>
Subject Re: Deploying ML Pipeline Model
Date Fri, 01 Jul 2016 19:59:10 GMT
Hi Nick,

Thanks for the answer. Do you think an implementation like the one in this
article is infeasible in production for say, hundreds of queries per
minute?
https://www.codementor.io/spark/tutorial/building-a-web-service-with-apache-spark-flask-example-app-part2.
The article uses Flask to define routes and Spark for evaluating requests.

Regards,
Saurabh






On Fri, Jul 1, 2016 at 10:47 AM, Nick Pentreath <nick.pentreath@gmail.com>
wrote:

> Generally there are 2 ways to use a trained pipeline model - (offline)
> batch scoring, and real-time online scoring.
>
> For batch (or even "mini-batch" e.g. on Spark streaming data), then yes
> certainly loading the model back in Spark and feeding new data through the
> pipeline for prediction works just fine, and this is essentially what is
> supported in 1.6 (and more or less full coverage in 2.0). For large batch
> cases this can be quite efficient.
>
> However, usually for real-time use cases, the latency required is fairly
> low - of the order of a few ms to a few 100ms for a request (some examples
> include recommendations, ad-serving, fraud detection etc).
>
> In these cases, using Spark has 2 issues: (1) latency for prediction on
> the pipeline, which is based on DataFrames and therefore distributed
> execution, is usually fairly high "per request"; (2) this requires pulling
> in all of Spark for your real-time serving layer (or running a full Spark
> cluster), which is usually way too much overkill - all you really need for
> serving is a bit of linear algebra and some basic transformations.
>
> So for now, unfortunately there is not much in the way of options for
> exporting your pipelines and serving them outside of Spark - the
> JPMML-based project mentioned on this thread is one option. The other
> option at this point is to write your own export functionality and your own
> serving layer.
>
> There is (very initial) movement towards improving the local serving
> possibilities (see https://issues.apache.org/jira/browse/SPARK-13944 which
> was the "first step" in this process).
>
> On Fri, 1 Jul 2016 at 19:24 Jacek Laskowski <jacek@japila.pl> wrote:
>
>> Hi Rishabh,
>>
>> I've just today had similar conversation about how to do a ML Pipeline
>> deployment and couldn't really answer this question and more because I
>> don't really understand the use case.
>>
>> What would you expect from ML Pipeline model deployment? You can save
>> your model to a file by model.write.overwrite.save("model_v1").
>>
>> model_v1
>> |-- metadata
>> |   |-- _SUCCESS
>> |   `-- part-00000
>> `-- stages
>>     |-- 0_regexTok_b4265099cc1c
>>     |   `-- metadata
>>     |       |-- _SUCCESS
>>     |       `-- part-00000
>>     |-- 1_hashingTF_8de997cf54ba
>>     |   `-- metadata
>>     |       |-- _SUCCESS
>>     |       `-- part-00000
>>     `-- 2_linReg_3942a71d2c0e
>>         |-- data
>>         |   |-- _SUCCESS
>>         |   |-- _common_metadata
>>         |   |-- _metadata
>>         |   `--
>> part-r-00000-2096c55a-d654-42b2-90d3-5a310101cba5.gz.parquet
>>         `-- metadata
>>             |-- _SUCCESS
>>             `-- part-00000
>>
>> 9 directories, 12 files
>>
>> What would you like to have outside SparkContext? What's wrong with
>> using Spark? Just curious hoping to understand the use case better.
>> Thanks.
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Fri, Jul 1, 2016 at 12:54 PM, Rishabh Bhardwaj <rbnext29@gmail.com>
>> wrote:
>> > Hi All,
>> >
>> > I am looking for ways to deploy a ML Pipeline model in production .
>> > Spark has already proved to be a one of the best framework for model
>> > training and creation, but once the ml pipeline model is ready how can I
>> > deploy it outside spark context ?
>> > MLlib model has toPMML method but today Pipeline model can not be saved
>> to
>> > PMML. There are some frameworks like MLeap which are trying to abstract
>> > Pipeline Model and provide ML Pipeline Model deployment outside spark
>> > context,but currently they don't have most of the ml transformers and
>> > estimators.
>> > I am looking for related work going on this area.
>> > Any pointers will be helpful.
>> >
>> > Thanks,
>> > Rishabh.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>

Mime
View raw message