spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Fregly <ch...@fregly.com>
Subject Re: Serving Spark ML models via a regular Python web app
Date Thu, 11 Aug 2016 16:35:55 GMT
this is exactly what my http://pipeline.io project is addressing.  check it out and send me
feedback or create issues at that github location.

> On Aug 11, 2016, at 7:42 AM, Nicholas Chammas <nicholas.chammas@gmail.com> wrote:
> 
> Thanks Michael for the reference, and thanks Nick for the comprehensive overview of existing
JIRA discussions about this. I've added myself as a watcher on the various tasks.
> 
>> On Thu, Aug 11, 2016 at 3:02 AM Nick Pentreath <nick.pentreath@gmail.com> wrote:
>> Currently there is no direct way in Spark to serve models without bringing in all
of Spark as a dependency.
>> 
>> For Spark ML, there is actually no way to do it independently of DataFrames either
(which for single-instance prediction makes things sub-optimal). That is covered here: https://issues.apache.org/jira/browse/SPARK-10413
>> 
>> So, your options are (in Scala) things like MLeap, PredictionIO, or "roll your own".
Or you can try to export to some other format such as PMML or PFA. Some MLlib models support
PMML export, but for ML it is still missing (see https://issues.apache.org/jira/browse/SPARK-11171).
>> 
>> There is an external project for PMML too (note licensing) - https://github.com/jpmml/jpmml-sparkml
- which is by now actually quite comprehensive. It shows that PMML can represent a pretty
large subset of typical ML pipeline functionality.
>> 
>> On the Python side sadly there is even less - I would say your options are pretty
much "roll your own" currently, or export in PMML or PFA.
>> 
>> Finally, part of the "mllib-local" idea was around enabling this local model-serving
(for some initial discussion about the future see https://issues.apache.org/jira/browse/SPARK-16365).
>> 
>> N
>> 
>> 
>>> On Thu, 11 Aug 2016 at 06:28 Michael Allman <michael@videoamp.com> wrote:
>>> Nick,
>>> 
>>> Check out MLeap: https://github.com/TrueCar/mleap. It's not python, but we use
it in production to serve a random forest model trained by a Spark ML pipeline.
>>> 
>>> Thanks,
>>> 
>>> Michael
>>> 
>>>> On Aug 10, 2016, at 7:50 PM, Nicholas Chammas <nicholas.chammas@gmail.com>
wrote:
>>>> 
>>>> Are there any existing JIRAs covering the possibility of serving up Spark
ML models via, for example, a regular Python web app?
>>>> 
>>>> The story goes like this: You train your model with Spark on several TB of
data, and now you want to use it in a prediction service that you’re building, say with
Flask. In principle, you don’t need Spark anymore since you’re just passing individual
data points to your model and looking for it to spit some prediction back.
>>>> 
>>>> I assume this is something people do today, right? I presume Spark needs
to run in their web service to serve up the model. (Sorry, I’m new to the ML side of Spark.
😅)
>>>> 
>>>> Are there any JIRAs discussing potential improvements to this story? I did
a search, but I’m not sure what exactly to look for. SPARK-4587 (model import/export) looks
relevant, but doesn’t address the story directly.
>>>> 
>>>> Nick

Mime
View raw message