spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shannon Quinn <squ...@gatech.edu>
Subject Iterative changes to RDD and broadcast variables
Date Mon, 17 Nov 2014 02:32:28 GMT
Hi all,

I'm iterating over an RDD (representing a distributed matrix...have to 
roll my own in Python) and making changes to different submatrices at 
each iteration. The loop structure looks something like:

for i in range(x):
   VAR = sc.broadcast(i)
   rdd.map(func1).reduceByKey(func2)
M = rdd.collect()

where "func1" and "func2" use the current value of VAR for that iteration.

Because there aren't any "actions" in the main loop, nothing actually 
happens until the "collect" method is called. I'm running into problems 
I can't diagnose (*extremely* long execution time for no particular 
reason, among others); is this code even valid? If not, how should make 
in-place iterative edits to different portions of a matrix, where each 
subsequent edit is dependent on the edits from the previous iteration?

Thanks in advance!

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message