spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Fregly <ch...@fregly.com>
Subject Re: Serving Spark ML models via a regular Python web app
Date Thu, 11 Aug 2016 16:42:12 GMT
And here's a recent slide deck on the pipeline.io that summarizes what we're working on (all
open source):  

https://www.slideshare.net/mobile/cfregly/advanced-spark-and-tensorflow-meetup-08042016-one-click-spark-ml-pipeline-deploy-to-production

mleap is heading the wrong direction and reinventing the wheel.  not quite sure where that
project will go.  doesn't seem like it will have a long shelf-life in my opinion.

check out pipeline.io.  some cool stuff in there.

> On Aug 11, 2016, at 9:35 AM, Chris Fregly <chris@fregly.com> wrote:
> 
> this is exactly what my http://pipeline.io project is addressing.  check it out and send
me feedback or create issues at that github location.
> 
>> On Aug 11, 2016, at 7:42 AM, Nicholas Chammas <nicholas.chammas@gmail.com>
wrote:
>> 
>> Thanks Michael for the reference, and thanks Nick for the comprehensive overview
of existing JIRA discussions about this. I've added myself as a watcher on the various tasks.
>> 
>>> On Thu, Aug 11, 2016 at 3:02 AM Nick Pentreath <nick.pentreath@gmail.com>
wrote:
>>> Currently there is no direct way in Spark to serve models without bringing in
all of Spark as a dependency.
>>> 
>>> For Spark ML, there is actually no way to do it independently of DataFrames either
(which for single-instance prediction makes things sub-optimal). That is covered here: https://issues.apache.org/jira/browse/SPARK-10413
>>> 
>>> So, your options are (in Scala) things like MLeap, PredictionIO, or "roll your
own". Or you can try to export to some other format such as PMML or PFA. Some MLlib models
support PMML export, but for ML it is still missing (see https://issues.apache.org/jira/browse/SPARK-11171).
>>> 
>>> There is an external project for PMML too (note licensing) - https://github.com/jpmml/jpmml-sparkml
- which is by now actually quite comprehensive. It shows that PMML can represent a pretty
large subset of typical ML pipeline functionality.
>>> 
>>> On the Python side sadly there is even less - I would say your options are pretty
much "roll your own" currently, or export in PMML or PFA.
>>> 
>>> Finally, part of the "mllib-local" idea was around enabling this local model-serving
(for some initial discussion about the future see https://issues.apache.org/jira/browse/SPARK-16365).
>>> 
>>> N
>>> 
>>> 
>>>> On Thu, 11 Aug 2016 at 06:28 Michael Allman <michael@videoamp.com>
wrote:
>>>> Nick,
>>>> 
>>>> Check out MLeap: https://github.com/TrueCar/mleap. It's not python, but we
use it in production to serve a random forest model trained by a Spark ML pipeline.
>>>> 
>>>> Thanks,
>>>> 
>>>> Michael
>>>> 
>>>>> On Aug 10, 2016, at 7:50 PM, Nicholas Chammas <nicholas.chammas@gmail.com>
wrote:
>>>>> 
>>>>> Are there any existing JIRAs covering the possibility of serving up Spark
ML models via, for example, a regular Python web app?
>>>>> 
>>>>> The story goes like this: You train your model with Spark on several
TB of data, and now you want to use it in a prediction service that you’re building, say
with Flask. In principle, you don’t need Spark anymore since you’re just passing individual
data points to your model and looking for it to spit some prediction back.
>>>>> 
>>>>> I assume this is something people do today, right? I presume Spark needs
to run in their web service to serve up the model. (Sorry, I’m new to the ML side of Spark.
😅)
>>>>> 
>>>>> Are there any JIRAs discussing potential improvements to this story?
I did a search, but I’m not sure what exactly to look for. SPARK-4587 (model import/export)
looks relevant, but doesn’t address the story directly.
>>>>> 
>>>>> Nick

Mime
View raw message