spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From obaidul karim <obaidc...@gmail.com>
Subject Spark Streaming: Combine MLlib Prediction and Features on Dstreams
Date Fri, 27 May 2016 04:33:27 GMT
Hi Guys,

This is my first mail to spark users mailing list.

I need help on Dstream operation.

In fact, I am using a MLlib randomforest model to predict using spark
streaming. In the end, I want to combine the feature Dstream & prediction
Dstream together for further downstream processing.

I am predicting using below piece of code:

predictions = texts.map( lambda x : getFeatures(x) ).map(lambda x :
x.split(',')).map( lambda parts : [float(i) for i in parts]
).transform(lambda rdd: rf_model.predict(rdd))

Here texts is dstream having single line of text as records
getFeatures generates a comma separated features extracted from each record


I want the output as below tuple:
("predicted value", "original text")

How can I achieve that ?
or
at least can I perform .zip like normal RDD operation on two Dstreams, like
below:
output = texts.zip(predictions)


Thanks in advance.

-Obaid

Mime
View raw message