spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gylfi <>
Subject Re: Passing Broadcast variable as parameter
Date Sat, 18 Jul 2015 09:03:44 GMT

You can use a broadcast variable to make data available to all the nodes in
your cluster that can live longer then just the current distributed task. 

For example if you need a to access a large structure in multiple sub-tasks,
instead of sending that structure again and again with each sub-task you can
send it only once and access the data inside the operation (map, flatmap
etc.) by way of the broadcast variable name .value 

See :

Note however that you should treat the broadcast variable as a read-only
structure as it is not synced between workers after it is broadcasted.

To broadcast, your data must be serializable.

If the data you are trying to broadcast is a distributed RDD (and thus I
assumably large), perhaps what you need is some form of join operation (or


View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message