spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Debasish Das <debasish.da...@gmail.com>
Subject Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext
Date Sat, 04 Feb 2017 17:37:20 GMT
If we expose an API to access the raw models out of PipelineModel can't we
call predict directly on it from an API ? Is there a task open to expose
the model out of PipelineModel so that predict can be called on it....there
is no dependency of spark context in ml model...
On Feb 4, 2017 9:11 AM, "Aseem Bansal" <asmbansal2@gmail.com> wrote:

>
>    - In Spark 2.0 there is a class called PipelineModel. I know that the
>    title says pipeline but it is actually talking about PipelineModel trained
>    via using a Pipeline.
>    - Why PipelineModel instead of pipeline? Because usually there is a
>    series of stuff that needs to be done when doing ML which warrants an
>    ordered sequence of operations. Read the new spark ml docs or one of the
>    databricks blogs related to spark pipelines. If you have used python's
>    sklearn library the concept is inspired from there.
>    - "once model is deserialized as ml model from the store of choice
>    within ms" - The timing of loading the model was not what I was
>    referring to when I was talking about timing.
>    - "it can be used on incoming features to score through spark.ml.Model
>    predict API". The predict API is in the old mllib package not the new ml
>    package.
>    - "why r we using dataframe and not the ML model directly from API" -
>    Because as of now the new ml package does not have the direct API.
>
>
> On Sat, Feb 4, 2017 at 10:24 PM, Debasish Das <debasish.das83@gmail.com>
> wrote:
>
>> I am not sure why I will use pipeline to do scoring...idea is to build a
>> model, use model ser/deser feature to put it in the row or column store of
>> choice and provide a api access to the model...we support these primitives
>> in github.com/Verizon/trapezium...the api has access to spark context in
>> local or distributed mode...once model is deserialized as ml model from the
>> store of choice within ms, it can be used on incoming features to score
>> through spark.ml.Model predict API...I am not clear on 2200x speedup...why
>> r we using dataframe and not the ML model directly from API ?
>> On Feb 4, 2017 7:52 AM, "Aseem Bansal" <asmbansal2@gmail.com> wrote:
>>
>>> Does this support Java 7?
>>> What is your timezone in case someone wanted to talk?
>>>
>>> On Fri, Feb 3, 2017 at 10:23 PM, Hollin Wilkins <hollin@combust.ml>
>>> wrote:
>>>
>>>> Hey Aseem,
>>>>
>>>> We have built pipelines that execute several string indexers, one hot
>>>> encoders, scaling, and a random forest or linear regression at the end.
>>>> Execution time for the linear regression was on the order of 11
>>>> microseconds, a bit longer for random forest. This can be further optimized
>>>> by using row-based transformations if your pipeline is simple to around 2-3
>>>> microseconds. The pipeline operated on roughly 12 input features, and by
>>>> the time all the processing was done, we had somewhere around 1000 features
>>>> or so going into the linear regression after one hot encoding and
>>>> everything else.
>>>>
>>>> Hope this helps,
>>>> Hollin
>>>>
>>>> On Fri, Feb 3, 2017 at 4:05 AM, Aseem Bansal <asmbansal2@gmail.com>
>>>> wrote:
>>>>
>>>>> Does this support Java 7?
>>>>>
>>>>> On Fri, Feb 3, 2017 at 5:30 PM, Aseem Bansal <asmbansal2@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Is computational time for predictions on the order of few
>>>>>> milliseconds (< 10 ms) like the old mllib library?
>>>>>>
>>>>>> On Thu, Feb 2, 2017 at 10:12 PM, Hollin Wilkins <hollin@combust.ml>
>>>>>> wrote:
>>>>>>
>>>>>>> Hey everyone,
>>>>>>>
>>>>>>>
>>>>>>> Some of you may have seen Mikhail and I talk at Spark/Hadoop
Summits
>>>>>>> about MLeap and how you can use it to build production services
from your
>>>>>>> Spark-trained ML pipelines. MLeap is an open-source technology
that allows
>>>>>>> Data Scientists and Engineers to deploy Spark-trained ML Pipelines
and
>>>>>>> Models to a scoring engine instantly. The MLeap execution engine
has no
>>>>>>> dependencies on a Spark context and the serialization format
is entirely
>>>>>>> based on Protobuf 3 and JSON.
>>>>>>>
>>>>>>>
>>>>>>> The recent 0.5.0 release provides serialization and inference
>>>>>>> support for close to 100% of Spark transformers (we don’t yet
support ALS
>>>>>>> and LDA).
>>>>>>>
>>>>>>>
>>>>>>> MLeap is open-source, take a look at our Github page:
>>>>>>>
>>>>>>> https://github.com/combust/mleap
>>>>>>>
>>>>>>>
>>>>>>> Or join the conversation on Gitter:
>>>>>>>
>>>>>>> https://gitter.im/combust/mleap
>>>>>>>
>>>>>>>
>>>>>>> We have a set of documentation to help get you started here:
>>>>>>>
>>>>>>> http://mleap-docs.combust.ml/
>>>>>>>
>>>>>>>
>>>>>>> We even have a set of demos, for training ML Pipelines and linear,
>>>>>>> logistic and random forest models:
>>>>>>>
>>>>>>> https://github.com/combust/mleap-demo
>>>>>>>
>>>>>>>
>>>>>>> Check out our latest MLeap-serving Docker image, which allows
you to
>>>>>>> expose a REST interface to your Spark ML pipeline models:
>>>>>>>
>>>>>>> http://mleap-docs.combust.ml/mleap-serving/
>>>>>>>
>>>>>>>
>>>>>>> Several companies are using MLeap in production and even more
are
>>>>>>> currently evaluating it. Take a look and tell us what you think!
We hope to
>>>>>>> talk with you soon and welcome feedback/suggestions!
>>>>>>>
>>>>>>>
>>>>>>> Sincerely,
>>>>>>>
>>>>>>> Hollin and Mikhail
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>

Mime
View raw message