spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From nguyen duc tuan <newvalu...@gmail.com>
Subject Re: Spark Streaming: Combine MLlib Prediction and Features on Dstreams
Date Mon, 30 May 2016 11:58:59 GMT
How about this ?

def extract_feature(rf_model, x):
text = getFeatures(x).split(',')
fea = [float(i) for i in text]
prediction = rf_model.predict(fea)
return (prediction, x)
output = texts.map(lambda x: extract_feature(rf_model, x))

2016-05-30 14:17 GMT+07:00 obaidul karim <obaidcuet@gmail.com>:

> Hi,
>
> Anybody has any idea on below?
>
> -Obaid
>
>
> On Friday, 27 May 2016, obaidul karim <obaidcuet@gmail.com> wrote:
>
>> Hi Guys,
>>
>> This is my first mail to spark users mailing list.
>>
>> I need help on Dstream operation.
>>
>> In fact, I am using a MLlib randomforest model to predict using spark
>> streaming. In the end, I want to combine the feature Dstream & prediction
>> Dstream together for further downstream processing.
>>
>> I am predicting using below piece of code:
>>
>> predictions = texts.map( lambda x : getFeatures(x) ).map(lambda x :
>> x.split(',')).map( lambda parts : [float(i) for i in parts]
>> ).transform(lambda rdd: rf_model.predict(rdd))
>>
>> Here texts is dstream having single line of text as records
>> getFeatures generates a comma separated features extracted from each
>> record
>>
>>
>> I want the output as below tuple:
>> ("predicted value", "original text")
>>
>> How can I achieve that ?
>> or
>> at least can I perform .zip like normal RDD operation on two Dstreams,
>> like below:
>> output = texts.zip(predictions)
>>
>>
>> Thanks in advance.
>>
>> -Obaid
>>
>

Mime
View raw message