spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Donald Szeto <don...@prediction.io>
Subject Re: deploying a model built in mllib
Date Fri, 07 Nov 2014 21:33:07 GMT
Hi Chirag,

Could you please provide more information on your Java server environment?

Regards,
Donald
ᐧ

On Fri, Nov 7, 2014 at 9:57 AM, chirag lakhani <chirag.lakhani@gmail.com>
wrote:

> Thanks for letting me know about this, it looks pretty interesting.  From
> reading the documentation it seems that the server must be built on a Spark
> cluster, is that correct?  Is it possible to deploy it in on a Java
> server?  That is how we are currently running our web app.
>
>
>
> On Tue, Nov 4, 2014 at 7:57 PM, Simon Chan <simonchan@gmail.com> wrote:
>
>> The latest version of PredictionIO, which is now under Apache 2 license,
>> supports the deployment of MLlib models on production.
>>
>> The "engine" you build will including a few components, such as:
>> - Data - includes Data Source and Data Preparator
>> - Algorithm(s)
>> - Serving
>> I believe that you can do the feature vector creation inside the Data
>> Preparator component.
>>
>> Currently, the package comes with two templates: 1)  Collaborative
>> Filtering Engine Template - with MLlib ALS; 2) Classification Engine
>> Template - with MLlib Naive Bayes. The latter one may be useful to you. And
>> you can customize the Algorithm component, too.
>>
>> I have just created a doc: http://docs.prediction.io/0.8.1/templates/
>> Love to hear your feedback!
>>
>> Regards,
>> Simon
>>
>>
>>
>> On Mon, Oct 27, 2014 at 11:03 AM, chirag lakhani <
>> chirag.lakhani@gmail.com> wrote:
>>
>>> Would pipelining include model export?  I didn't see that in the
>>> documentation.
>>>
>>> Are there ways that this is being done currently?
>>>
>>>
>>>
>>> On Mon, Oct 27, 2014 at 12:39 PM, Xiangrui Meng <mengxr@gmail.com>
>>> wrote:
>>>
>>>> We are working on the pipeline features, which would make this
>>>> procedure much easier in MLlib. This is still a WIP and the main JIRA
>>>> is at:
>>>>
>>>> https://issues.apache.org/jira/browse/SPARK-1856
>>>>
>>>> Best,
>>>> Xiangrui
>>>>
>>>> On Mon, Oct 27, 2014 at 8:56 AM, chirag lakhani
>>>> <chirag.lakhani@gmail.com> wrote:
>>>> > Hello,
>>>> >
>>>> > I have been prototyping a text classification model that my company
>>>> would
>>>> > like to eventually put into production.  Our technology stack is
>>>> currently
>>>> > Java based but we would like to be able to build our models in
>>>> Spark/MLlib
>>>> > and then export something like a PMML file which can be used for model
>>>> > scoring in real-time.
>>>> >
>>>> > I have been using scikit learn where I am able to take the training
>>>> data
>>>> > convert the text data into a sparse data format and then take the
>>>> other
>>>> > features and use the dictionary vectorizer to do one-hot encoding for
>>>> the
>>>> > other categorical variables.  All of those things seem to be possible
>>>> in
>>>> > mllib but I am still puzzled about how that can be packaged in such
a
>>>> way
>>>> > that the incoming data can be first made into feature vectors and then
>>>> > evaluated as well.
>>>> >
>>>> > Are there any best practices for this type of thing in Spark?  I hope
>>>> this
>>>> > is clear but if there are any confusions then please let me know.
>>>> >
>>>> > Thanks,
>>>> >
>>>> > Chirag
>>>>
>>>
>>>
>>
>


-- 
Donald Szeto
PredictionIO

Mime
View raw message