spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <>
Subject Re: Maximum Size of Reference Look Up Table in Spark
Date Fri, 15 Jul 2016 00:06:04 GMT

My understanding is that the maximum size of a broadcast is the
Long.MAX_VALUE (and plus some more since the data is going to be
encoded to save space, esp. for catalyst-driver datasets).

Ad 2. Before the tasks access the broadcast variable it has to be sent
across network that may be too slow to be acceptable.

Jacek Laskowski
Mastering Apache Spark
Follow me at

On Thu, Jul 14, 2016 at 11:32 PM, Saravanan Subramanian
<> wrote:
> Hello All,
> I am in the middle of designing real time data enhancement services using
> spark streaming.  As part of this, I have to look up some reference data
> while processing the incoming stream.
> I have below questions:
> 1) what is the maximum size of look up table / variable can be stored as
> Broadcast variable ()
> 2) What is the impact of cluster performance, if I store a 10GB data in
> broadcast variable
> Any suggestions and thoughts are welcome.
> Thanks,
> Saravanan S.

To unsubscribe e-mail:

View raw message