spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ayan guha <guha.a...@gmail.com>
Subject Re: SPARK MLLib - How to tie back Model.predict output to original data?
Date Wed, 17 Aug 2016 03:39:28 GMT
Hi

Thank you for your reply. Yes, I can get prediction and original features
together. My question is how to tie them back to other parts of the data,
which was not in LP.

For example, I have a bunch of other dimensions which are not part of
features or label.

Sorry if this is a stupid question.

On Wed, Aug 17, 2016 at 12:57 PM, Yanbo Liang <ybliang8@gmail.com> wrote:

> MLlib will keep the original dataset during transformation, it just append
> new columns to existing DataFrame. That is you can get both prediction
> value and original features from the output DataFrame of model.transform.
>
> Thanks
> Yanbo
>
> 2016-08-16 17:48 GMT-07:00 ayan guha <guha.ayan@gmail.com>:
>
>> Hi
>>
>> I have a dataset as follows:
>>
>> DF:
>> amount:float
>> date_read:date
>> meter_number:string
>>
>> I am trying to predict future amount based on past 3 weeks consumption
>> (and a heaps of weather data related to date).
>>
>> My Labelpoint looks like
>>
>> label (populated from DF.amount)
>> features (populated from a bunch of other stuff)
>>
>> Model.predict output:
>> label
>> prediction
>>
>> Now, I am trying to put together this prediction value back to meter
>> number and date_read from original DF?
>>
>> One way to assume order of records in DF and Model.predict will be
>> exactly same and zip two RDDs. But any other (possibly better) solution?
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>


-- 
Best Regards,
Ayan Guha

Mime
View raw message