spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henggang Cui <>
Subject Merging all Spark Streaming RDDs to one RDD
Date Tue, 10 Jun 2014 01:00:00 GMT

I'm wondering whether it's possible to continuously merge the RDDs coming
from a stream into a single RDD efficiently.

One thought is to use the union() method. But using union, I will get a new
RDD each time I do a merge. I don't know how I should name these RDDs,
because I remember Spark does not encourage users to create an array of

Another possible solution is to follow the example of
"StatefulNetworkWordCount", which uses the updateStateByKey() method. But
my RDD type is not key value pairs (it's a struct with multiple fields). Is
there a workaround?


View raw message