spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From obaidul karim <obaidc...@gmail.com>
Subject Re: Spark Streaming: Combine MLlib Prediction and Features on Dstreams
Date Mon, 30 May 2016 07:17:14 GMT
Hi,

Anybody has any idea on below?

-Obaid

On Friday, 27 May 2016, obaidul karim <obaidcuet@gmail.com> wrote:

> Hi Guys,
>
> This is my first mail to spark users mailing list.
>
> I need help on Dstream operation.
>
> In fact, I am using a MLlib randomforest model to predict using spark
> streaming. In the end, I want to combine the feature Dstream & prediction
> Dstream together for further downstream processing.
>
> I am predicting using below piece of code:
>
> predictions = texts.map( lambda x : getFeatures(x) ).map(lambda x :
> x.split(',')).map( lambda parts : [float(i) for i in parts]
> ).transform(lambda rdd: rf_model.predict(rdd))
>
> Here texts is dstream having single line of text as records
> getFeatures generates a comma separated features extracted from each record
>
>
> I want the output as below tuple:
> ("predicted value", "original text")
>
> How can I achieve that ?
> or
> at least can I perform .zip like normal RDD operation on two Dstreams,
> like below:
> output = texts.zip(predictions)
>
>
> Thanks in advance.
>
> -Obaid
>

Mime
View raw message