spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yanbo Liang <yblia...@gmail.com>
Subject Re: SPARK MLLib - How to tie back Model.predict output to original data?
Date Wed, 17 Aug 2016 02:57:35 GMT
MLlib will keep the original dataset during transformation, it just append
new columns to existing DataFrame. That is you can get both prediction
value and original features from the output DataFrame of model.transform.

Thanks
Yanbo

2016-08-16 17:48 GMT-07:00 ayan guha <guha.ayan@gmail.com>:

> Hi
>
> I have a dataset as follows:
>
> DF:
> amount:float
> date_read:date
> meter_number:string
>
> I am trying to predict future amount based on past 3 weeks consumption
> (and a heaps of weather data related to date).
>
> My Labelpoint looks like
>
> label (populated from DF.amount)
> features (populated from a bunch of other stuff)
>
> Model.predict output:
> label
> prediction
>
> Now, I am trying to put together this prediction value back to meter
> number and date_read from original DF?
>
> One way to assume order of records in DF and Model.predict will be exactly
> same and zip two RDDs. But any other (possibly better) solution?
>
> --
> Best Regards,
> Ayan Guha
>

Mime
View raw message