spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrian Mocanu <amoc...@verticalscope.com>
Subject RE: function state lost when next RDD is processed
Date Fri, 28 Mar 2014 14:12:29 GMT
I'd like to resurrect this thread since I don't have an answer yet.

From: Adrian Mocanu [mailto:amocanu@verticalscope.com]
Sent: March-27-14 10:04 AM
To: user@spark.incubator.apache.org
Subject: function state lost when next RDD is processed

Is there a way to pass a custom function to spark to run it on the entire stream? For example,
say I have a function which sums up values in each RDD and then across RDDs.

I've tried with map, transform, reduce. They all apply my sum function on 1 RDD. When the
next RDD comes the function starts from 0 so the sum of the previous RDD is lost.

Does Spark support a way of passing a custom function so that its state is preserved across
RDDs and not only within RDD?

Thanks
-Adrian


Mime
View raw message