spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zh8788 <>
Subject How to keep a local variable in each cluster?
Date Mon, 24 Nov 2014 01:41:37 GMT

I am new to spark. This is the first time I am posting here. Currently, I
try to implement ADMM optimization algorithms for Lasso/SVM
Then I come across a problem:

Since the training data(label, feature) is large, so I created a RDD and
cached the training data(label, feature ) in memory.  Then for ADMM, it
needs to keep  local parameters (u,v) (which are different for each
partition ). For each iteration, I need to use the training data(only on
that partition), u, v to calculate the new value for u and v. 


One way is to zip (training data, u, v) into a rdd and update it in each
iteration, but as we can see, training data is large and won't change for
the whole time, only u, v (is small) are changed in each iteration. If I zip
these three, I could not cache that rdd (since it changed for every
iteration). But if did not cache that, I need to reuse the training data
every iteration, how could I do it?


Related to Question1, on the online documents, it said if we don't cache the
rdd, it  will not in the memory. And rdd uses delayed operation, then I am
confused when can I view a previous rdd in memroy.


B =
B.collect()    #This forces B to be calculated ? After that, the node just
release B since it is not cached ???   
D = 

B =
D =   


B =
C =
D = 
In which case, can I view  B is in memory in each cluster when I calculate


can I use a function to do operations on two rdds? 

E.g   Function newfun(rdd1, rdd2)  
#rdd1 is large and do not change for the whole time (training data), which I
can use cache
#rdd2 is small and change in each iteration (u, v )


Or are there other ways to solve this kind of problem? I think this is
common problem, but I could not find any good solutions.

Thanks a lot


View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message