spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vipul Pandey <vipan...@gmail.com>
Subject Re: Asynchronous Broadcast from driver to workers, is it possible?
Date Tue, 21 Oct 2014 23:07:34 GMT
any word on this one? I would like to get this done as well. 
Although, my real use case is to do something on each executor right up in the beginning -
and I was trying to hack it using broadcasts by broadcasting an object of my own and do whatever
I want in the readObject method.

Any other way out?


On Oct 4, 2014, at 7:36 PM, Peng Cheng <pc175@uow.edu.au> wrote:

> While Spark already offers support for asynchronous reduce (collect data from
> workers, while not interrupting execution of a parallel transformation)
> through accumulator, I have made little progress on making this process
> reciprocal, namely, to broadcast data from driver to workers to be used by
> all executors in the middle of a transformation. This primarily intended to
> be used in downpour SGD/adagrad, a non-blocking concurrent machine learning
> optimizer that performs better than existing synchronous GD in MLlib, and
> have vast application in training of many models.
> 
> My attempt so far is to stick to out-of-the-box, immutable broadcast, open a
> new thread on driver, in which I broadcast a thin data wrapper that when
> deserialized, will insert into a mutable singleton that is already
> replicated to all workers in the fat jar, this customized deserialization is
> not hard, just overwrite readObject like this:
> 
> class AutoInsert(var value: Int) extends Serializable{
> 
>  WorkerReplica.last = value
> 
>  private def readObject(in: ObjectInputStream): Unit = {
>    in.defaultReadObject()
>    WorkerContainer.last = this.value
>  }
> }
> 
> Unfortunately it looks like the deserializtion is called lazily and won't do
> anything before a worker use it (Broadcast[AutoInsert]), this is impossible
> without waiting for workers' stage to be finished and broadcast again. I'm
> wondering if I can 'hack' this thing into working? Or I'll have to write a
> serious extension to broadcast component to enable changing the value.
> 
> Hope I can find like-minded on this forum because ML is a selling point of
> Spark.
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Asynchronous-Broadcast-from-driver-to-workers-is-it-possible-tp15758.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message