spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrian Mocanu <amoc...@verticalscope.com>
Subject RE: function state lost when next RDD is processed
Date Fri, 28 Mar 2014 14:47:16 GMT
Thanks!

Ya that's what I'm doing so far, but I wanted to see if it's possible to keep the tuples inside
Spark for fault tolerance purposes.

-A
From: Mark Hamstra [mailto:mark@clearstorydata.com]
Sent: March-28-14 10:45 AM
To: user@spark.apache.org
Subject: Re: function state lost when next RDD is processed

As long as the amount of state being passed is relatively small, it's probably easiest to
send it back to the driver and to introduce it into RDD transformations as the zero value
of a fold.

On Fri, Mar 28, 2014 at 7:12 AM, Adrian Mocanu <amocanu@verticalscope.com<mailto:amocanu@verticalscope.com>>
wrote:
I'd like to resurrect this thread since I don't have an answer yet.

From: Adrian Mocanu [mailto:amocanu@verticalscope.com<mailto:amocanu@verticalscope.com>]
Sent: March-27-14 10:04 AM
To: user@spark.incubator.apache.org<mailto:user@spark.incubator.apache.org>
Subject: function state lost when next RDD is processed

Is there a way to pass a custom function to spark to run it on the entire stream? For example,
say I have a function which sums up values in each RDD and then across RDDs.

I've tried with map, transform, reduce. They all apply my sum function on 1 RDD. When the
next RDD comes the function starts from 0 so the sum of the previous RDD is lost.

Does Spark support a way of passing a custom function so that its state is preserved across
RDDs and not only within RDD?

Thanks
-Adrian



Mime
View raw message