spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daedalus <tushar.nagara...@gmail.com>
Subject Repeated Broadcasts
Date Fri, 20 Jun 2014 05:54:03 GMT
I'm trying to use Spark (Java) for an optimization algorithm that needs
repeated server-node exchanges of information. (The ADMM algorithm for
whoever is familiar). In each iteration, I need to update a set of values on
the nodes, and collect them on the server, which will update it's own set of
values, and pass this to ALL nodes.

Say each node optimizes a variable X={x1, x2, x3...}
While the server optimizes a variable Z={z1, z2, z3...}

I am currently using an Accumulable object to collect the updated X's from
each node into an array maintained on the server. 
Each node requires a copy of Z to optimize X, and this value of Z will
change on every iteration during optimization.

So, is there any computational advantage to using broadcasting Z at each
iteration over simply passing it as a parameter to each node?/ (Remember, Z
changes on each iteration)/

That is, which of the following snippets should I be implementing:

for(i=0; i<iters; i++){

    broadVar=sc.broadcast(Z);
    dataRDD.foreach(new voidFunction&lt;Data>(){
        public void call(Data d){
            X=d.optimize(broadVar.value());
            accum.add(X);
        }
    });

    Z=optimize_Z(accum);
}

*OR*

for(i=0; i<iters; i++){

    dataRDD.foreach(new voidFunction&lt;Data>(){
        public void call(Data d){
            X=d.optimize(Z);
            accum.add(X);
        }
    });

    Z=optimize_Z(accum);
}




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Repeated-Broadcasts-tp7977.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Mime
View raw message