spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Asher Krim <ak...@hubspot.com>
Subject Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext
Date Fri, 03 Feb 2017 18:48:01 GMT
I have a bunch of questions for you Hollin:

How easy is it to add support for custom pipelines/models?
Are Spark mllib models supported?
We currently run spark in local mode in an api service. It's not super
terrible, but performance is a constant struggle. Have you benchmarked any
performance differences between MLeap and vanilla Spark?
What does Tensorflow support look like? I would love to serve models from a
java stack while being agnostic to what framework was used to train them.

Thanks,
Asher Krim
Senior Software Engineer

On Fri, Feb 3, 2017 at 11:53 AM, Hollin Wilkins <hollin@combust.ml> wrote:

> Hey Aseem,
>
> We have built pipelines that execute several string indexers, one hot
> encoders, scaling, and a random forest or linear regression at the end.
> Execution time for the linear regression was on the order of 11
> microseconds, a bit longer for random forest. This can be further optimized
> by using row-based transformations if your pipeline is simple to around 2-3
> microseconds. The pipeline operated on roughly 12 input features, and by
> the time all the processing was done, we had somewhere around 1000 features
> or so going into the linear regression after one hot encoding and
> everything else.
>
> Hope this helps,
> Hollin
>
> On Fri, Feb 3, 2017 at 4:05 AM, Aseem Bansal <asmbansal2@gmail.com> wrote:
>
>> Does this support Java 7?
>>
>> On Fri, Feb 3, 2017 at 5:30 PM, Aseem Bansal <asmbansal2@gmail.com>
>> wrote:
>>
>>> Is computational time for predictions on the order of few milliseconds
>>> (< 10 ms) like the old mllib library?
>>>
>>> On Thu, Feb 2, 2017 at 10:12 PM, Hollin Wilkins <hollin@combust.ml>
>>> wrote:
>>>
>>>> Hey everyone,
>>>>
>>>>
>>>> Some of you may have seen Mikhail and I talk at Spark/Hadoop Summits
>>>> about MLeap and how you can use it to build production services from your
>>>> Spark-trained ML pipelines. MLeap is an open-source technology that allows
>>>> Data Scientists and Engineers to deploy Spark-trained ML Pipelines and
>>>> Models to a scoring engine instantly. The MLeap execution engine has no
>>>> dependencies on a Spark context and the serialization format is entirely
>>>> based on Protobuf 3 and JSON.
>>>>
>>>>
>>>> The recent 0.5.0 release provides serialization and inference support
>>>> for close to 100% of Spark transformers (we don’t yet support ALS and LDA).
>>>>
>>>>
>>>> MLeap is open-source, take a look at our Github page:
>>>>
>>>> https://github.com/combust/mleap
>>>>
>>>>
>>>> Or join the conversation on Gitter:
>>>>
>>>> https://gitter.im/combust/mleap
>>>>
>>>>
>>>> We have a set of documentation to help get you started here:
>>>>
>>>> http://mleap-docs.combust.ml/
>>>>
>>>>
>>>> We even have a set of demos, for training ML Pipelines and linear,
>>>> logistic and random forest models:
>>>>
>>>> https://github.com/combust/mleap-demo
>>>>
>>>>
>>>> Check out our latest MLeap-serving Docker image, which allows you to
>>>> expose a REST interface to your Spark ML pipeline models:
>>>>
>>>> http://mleap-docs.combust.ml/mleap-serving/
>>>>
>>>>
>>>> Several companies are using MLeap in production and even more are
>>>> currently evaluating it. Take a look and tell us what you think! We hope
to
>>>> talk with you soon and welcome feedback/suggestions!
>>>>
>>>>
>>>> Sincerely,
>>>>
>>>> Hollin and Mikhail
>>>>
>>>
>>>
>>
>

Mime
View raw message