spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manish Amde <manish...@gmail.com>
Subject Re: Status of MLLib exporting models to PMML
Date Tue, 18 Nov 2014 04:34:49 GMT
Hi Charles,

I am not aware of other storage formats. Perhaps Sean or Sandy can
elaborate more given their experience with Oryx.

There is work by Smola et al at Google that talks about large scale model
update and deployment.
https://www.usenix.org/conference/osdi14/technical-sessions/presentation/li_mu

-Manish

On Sunday, November 16, 2014, Charles Earl <charles.cearl@gmail.com> wrote:

> Manish and others,
> A follow up question on my mind is whether there are protobuf (or other
> binary format) frameworks in the vein of PMML. Perhaps scientific data
> storage frameworks like netcdf, root are possible also.
> I like the comprehensiveness of PMML but as you mention the complexity of
> management for large models is a concern.
> Cheers
>
> On Fri, Nov 14, 2014 at 1:35 AM, Manish Amde <manish9ue@gmail.com
> <javascript:_e(%7B%7D,'cvml','manish9ue@gmail.com');>> wrote:
>
>> @Aris, we are closely following the PMML work that is going on and as
>> Xiangrui mentioned, it might be easier to migrate models such as logistic
>> regression and then migrate trees. Some of the models get fairly large (as
>> pointed out by Sung Chung) with deep trees as building blocks and we might
>> have to consider a distributed storage and prediction strategy.
>>
>>
>> On Tuesday, November 11, 2014, Xiangrui Meng <mengxr@gmail.com
>> <javascript:_e(%7B%7D,'cvml','mengxr@gmail.com');>> wrote:
>>
>>> Vincenzo sent a PR and included k-means as an example. Sean is helping
>>> review it. PMML standard is quite large. So we may start with simple
>>> model export, like linear methods, then move forward to tree-based.
>>> -Xiangrui
>>>
>>> On Mon, Nov 10, 2014 at 11:27 AM, Aris <arisofalaska@gmail.com> wrote:
>>> > Hello Spark and MLLib folks,
>>> >
>>> > So a common problem in the real world of using machine learning is
>>> that some
>>> > data analysis use tools like R, but the more "data engineers" out
>>> there will
>>> > use more advanced systems like Spark MLLib or even Python Scikit Learn.
>>> >
>>> > In the real world, I want to have "a system" where multiple different
>>> > modeling environments can learn from data / build models, represent the
>>> > models in a common language, and then have a layer which just takes the
>>> > model and run model.predict() all day long -- scores the models in
>>> other
>>> > words.
>>> >
>>> > It looks like the project openscoring.io and jpmml-evaluator are some
>>> > amazing systems for this, but they fundamentally use PMML as the model
>>> > representation here.
>>> >
>>> > I have read some JIRA tickets that Xiangrui Meng is interested in
>>> getting
>>> > PMML implemented to export MLLib models, is that happening? Further,
>>> would
>>> > something like Manish Amde's boosted ensemble tree methods be
>>> representable
>>> > in PMML?
>>> >
>>> > Thank you!!
>>> > Aris
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>
>
> --
> - Charles
>

Mime
View raw message